Wikitech labswiki https://wikitech.wikimedia.org/wiki/Main_Page MediaWiki 1.46.0-wmf.22 first-letter Media Special Talk User User talk Wikitech Wikitech talk File File talk MediaWiki MediaWiki talk Template Template talk Help Help talk Category Category talk Obsolete Obsolete talk OfficeIT OfficeIT talk Tool Tool talk Nova Resource Nova Resource Talk Heira Heira Talk TimedText TimedText talk Module Module talk Deployments 0 4108 2398803 2398718 2026-04-03T14:03:48Z ScheduleDeploymentBot 37566 Add [[gerrit:1264856]] to Tuesday, April 07 UTC late backport window 2398803 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[:m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[:mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[:m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the '''backport scheduling tool''' to more easily edit this page: <div style="text-align: center; margin: 1em 0">{{Clickable button 2|:toollabs:schedule-deployment|Schedule a backport|class=mw-ui-progressive}}</div> * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[:gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. **To create an one-off window, simply edit this page accordingly ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if you anticipate or are uncertain about noticeable impacts to database load, HTTP caching, or the introduction of new cookies. ** '''Announce''' deployments of major features to the community via [[:m:Tech/News/Next|Tech News]] and/or via other [[:mw:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[:phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[:mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of March 30== ==={{Deployment_day|date=2026-03-29}}=== {{Deployment calendar event card |when=2026-03-29 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-30}}=== {{Deployment calendar event card |when=2026-03-30 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.21|gerrit=1264578|title=hCaptcha: Add APCu cache layer to health checker|statusd=}} - {{phabricator|T421204}} {{phabricator|T412947}} {{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1262091|title=Enable $wgTempCategoryCollations for s3 wikis.|status=}} - {{phabricator|T419274}} {{phabricator|T419049}} {{ircnick|MichaelG_WMF|Michael Grosse (dev @ WMF Growth team)}} {{deploy|type=1.46.0-wmf.21|gerrit=1264590|title=instrument(ReviseTone): record start of copyedit session|status=d}} - {{phabricator|T419181}} {{ircnick|James_F|James_F}} {{deploy|type=1.46.0-wmf.21|gerrit=1261477|title=Replace WANObjectCache with new MemcachedWrapper concept|status=d}} - {{phabricator|T419666}} {{deploy|type=1.46.0-wmf.21|gerrit=1262199|title=Fix match case for setting minute, week or month TTL on OrchestratorRequest|status=d}} - {{phabricator|T421475}} {{deploy|type=config|gerrit=1256432|title=Wikifunctions: Switch cache from mcrouter-wikifunctions to special access|status=nd}} - {{phabricator|T419666}} {{ircnick|eileen-m__|eileen-m__}} {{deploy|type=1.46.0-wmf.21|gerrit=1264605|title=Instrumentation: Track clicks for user account menu experiment|status=d}} - {{phabricator|T418053}} {{deploy|type=1.46.0-wmf.21|gerrit=1264625|title=Display create account button in main menu when user is logged out.|status=d}} - {{phabricator|T418053}} {{phabricator|T415647}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-30 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-30 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-30 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1261526|title=config: Enable EmailConfirmationBanner on mediawikiwiki|status=}} - {{phabricator|T421366}} {{deploy|type=config|gerrit=1264662|title=config: Enable EmailConfirmationBanner on testwiki|status=}} - {{phabricator|T421366}} {{ircnick|tchin|tchin}} {{deploy|type=config|gerrit=1262303|title=[EventStreamConfig] Add product_metrics.web_base.active_reader_baseline stream|status=}} - {{phabricator|T420621}} {{ircnick|Nemoralis|Nemoralis}} {{deploy|type=config|gerrit=1264652|title=Add delete-redirect to filemovers on Wikimedia Commons|status=}} - {{phabricator|T421373}} {{ircnick|cjming|cjming}} {{deploy|type=config|gerrit=1264653|title=Add TestKitchenExposureResetEpoch config variable|status=}} - {{phabricator|T414738}} {{ircnick|annet|annet}} {{deploy|type=config|gerrit=1264630|title=Add event stream for logged-in reader retention experiment|status=}} - {{phabricator|T420490}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-30 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-30 17:00 SF |length=1 |window=Abstract Wikipedia off-cadence backend deployment |who=Abstract Wikipedia |what=Extra backend deployment to ensure that recent changes work as expected in prod }} {{Deployment calendar event card |when=2026-03-30 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.22</code> }} {{Deployment calendar event card |when=2026-03-30 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.22</code> to testwikis }} {{Deployment calendar event card |when=2026-03-30 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-30 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-31}}=== {{Deployment calendar event card |when=2026-03-31 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.21->1.46.0-wmf.22|1.46.0-wmf.21|1.46.0-wmf.21}} * group0 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-03-31 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-31 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-31 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1262091|title=Enable $wgTempCategoryCollations for s3 wikis.|status=}} - {{phabricator|T419274}} {{phabricator|T419049}} {{ircnick|xSavitar|xSavitar}} {{deploy|type=1.46.0-wmf.22|gerrit=1265367|title=Set a JWT cookie for OAuth 1 and OAuth 2 owner-only requests|status=}} - {{phabricator|T417833}} {{deploy|type=1.46.0-wmf.22|gerrit=1265368|title=tests: OAuth1 and OAuth2 owner-only JWT support|status=}} - {{phabricator|T417833}} {{phabricator|T415281}} {{deploy|type=1.46.0-wmf.22|gerrit=1265369|title=tests: Add test for asserting JWT cookie not set for OAuth2 consumers|status=}} - {{phabricator|T417833}} {{phabricator|T415281}} {{deploy|type=config|gerrit=1260006|title=Enable JWTs for OAuth1 consumers and OAuth2 owner-only consumers|status=}} - {{phabricator|T417833}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-31 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-31 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-31 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1238312|title=Test Kitchen SLOs: Renaming slos because of the Test Kitchen renaming |status=}} - {{phabricator|T414381}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-31 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-31 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1261732|title=Move all analytics API sandbox entries to testwiki|status=}} - {{phabricator|T419429}} {{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.22|gerrit=1264921|title=Email confirmation banner: Add Test Kitchen A/B gating|status=}} - {{phabricator|T421366}} {{deploy|type=1.46.0-wmf.22|gerrit=1264922|title=Add instrumentation for email confirmation lifecycle events|status=}} - {{phabricator|T420007}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-31 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-01}}=== {{Deployment calendar event card |when=2026-04-01 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22|1.46.0-wmf.21->1.46.0-wmf.22|1.46.0-wmf.21}} * group1 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-04-01 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who={{ircnick|dusen|daniel}}, {{ircnick|effie|effie}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Daniel deploying REST gateway updates, five patches starting at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1260763/4 }} {{Deployment calendar event card |when=2026-04-01 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-04-01 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-01 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-01 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-01 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1266314|title=config: Enable EmailConfirmationBanner on mediawikiwiki|status=}} - {{phabricator|T421366}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-01 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-01 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-01 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-02}}=== {{Deployment calendar event card |when=2026-04-02 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|georgekyz|georgekyz}} {{deploy|type=config|gerrit=1266228|title=EventStreamConfig: Add rr-multilingual prediction_change stream|status=}} - {{phabricator|T415892}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22|1.46.0-wmf.22|1.46.0-wmf.21->1.46.0-wmf.22}} * group2 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-04-02 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. {{ircnick|dues|DKinzler_(WMF)}} {{deploy|type=chart|gerrit=1266237|title=rest gateway: define authed-user class|status=}} - {{phabricator|T420280}} {{phabricator|T419796}} {{deploy|type=chart|gerrit=1265333|title=introduce policy for abstractwiki/wikifunctions|status=}} - {{phabricator|T421581}} }} {{Deployment calendar event card |when=2026-04-02 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-02 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1261516|title=config: Enable EmailConfirmationBanner on selected wikis|status=}} - {{phabricator|T421366}} {{ircnick|HouseOfM|HouseOfM}} {{deploy|type=config|gerrit=1266964|title=Enable the CampaignEvents extension on incubator|status=}} - {{phabricator|T421749}} {{ircnick|edsanders|edsanders}} {{deploy|type=1.46.0-wmf.22|gerrit=1266985|title=Fix suggestion mode availability check|status=}} - {{phabricator|T422143}} {{ircnick|bwang|bwang}} {{deploy|type=1.46.0-wmf.22|gerrit=1267008|title=Add logged-in reader retention instrument|status=}} - {{phabricator|T420490}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.22|gerrit=1267056|title=hCaptcha: Emit Prometheus counter on health check failover|status=}} - {{phabricator|T421204}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 07:00 SF |length=1 |window=DC Switchover: Day 8 - Codfw Repool |who={{ircnick|jasmine_}} |what=Codfw Repool }} {{Deployment calendar event card |when=2026-04-02 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-02 08:00 SF |length=1 |window=Train log triage |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-04-02 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-02 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-04-02 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-02 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|nya_1F616EMO|1F616EMO}} {{deploy|type=config|gerrit=1264569|title=zhwikinews: 20th anniversary logo change|status=}} - {{phabricator|T420165}} {{deploy|type=config|gerrit=1265959|title=arbcom_zhwiki: Enable SecurePoll without PII rights|status=}} - {{phabricator|T419309}} {{ircnick|bwang|bwang}} {{deploy|type=1.46.0-wmf.22|gerrit=1267008|title=Add logged-in reader retention instrument|status=}} - {{phabricator|T420490}} {{ircnick|kemayo|David L}} {{deploy|type=1.46.0-wmf.22|gerrit=1267204|title=SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-02 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-03}}=== {{Deployment calendar event card |when=2026-04-03 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-04-03 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-04-04}}=== {{Deployment calendar event card |when=2026-04-04 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of April 06== ==={{Deployment_day|date=2026-04-05}}=== {{Deployment calendar event card |when=2026-04-05 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-04-06}}=== {{Deployment calendar event card |when=2026-04-06 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-06 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-04-06 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-04-06 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-04-06 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-06 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.23</code> }} {{Deployment calendar event card |when=2026-04-06 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.23</code> to testwikis }} {{Deployment calendar event card |when=2026-04-06 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-04-06 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-07}}=== {{Deployment calendar event card |when=2026-04-07 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-07 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-07 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-04-07 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-07 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-04-07 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-07 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-07 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22|1.46.0-wmf.22}} * group0 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-07 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|hyang|hyang}} {{deploy|type=config|gerrit=1264856|title=REST: Publish ReadingLists v0 module in REST Sandbox|status=}} - {{phabricator|T419619}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-07 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-08}}=== {{Deployment calendar event card |when=2026-04-08 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22}} * group1 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-08 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-04-08 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-08 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-08 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22}} * group1 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-08 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-08 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-08 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-09}}=== {{Deployment calendar event card |when=2026-04-09 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23}} * group2 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-09 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-09 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-09 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-09 08:00 SF |length=1 |window=Train log triage |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-04-09 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-09 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-04-09 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-09 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23}} * group2 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-09 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-09 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-10}}=== {{Deployment calendar event card |when=2026-04-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-04-10 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-04-11}}=== {{Deployment calendar event card |when=2026-04-11 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} 3821t67qi9876lzfee7ar2fkw5bhgi0 2398838 2398803 2026-04-04T02:00:17Z DeploymentCalendarTool 20896 Remove Week of March 30 2398838 wikitext text/x-wiki {{Navigation MediaWiki deployment}} This page tracks '''upcoming''' '''deployments''' of software to the [[:m:Special:SiteMatrix|Wikimedia Foundation servers]]. == Getting started == Ensure you joined the {{irc|wikimedia-operations}} IRC channel as all deployment-related communications happen there. If you need help, contact [[:mw:Wikimedia Release Engineering Team|Release Engineering]] on IRC at {{irc|wikimedia-releng}}; and ping Tyler (<code>thcipriani</code>). * '''MediaWiki is deployed weekly''' through the [[/Train|Deployment Train]]. Other services follow their own schedule. * '''Times are pinned to San Francisco''', thus the UTC time changes in March and November per [[:en:Daylight saving time in the United States|DST]]. * '''Prefer regular [[Backport windows]]''' over adding new windows. To request deployment of a config change or backport, add your username and Gerrit URL to one of the backport windows on this page. You must be online in #wikimedia-operations on IRC during your deployment and install [[WikimediaDebug]] ahead of time. The #wikimedia-operations channel requires you to [[:m:IRC/Instructions#Register your nickname, identify, and enforce|register your nickname]] before you can join. ** You can use the '''backport scheduling tool''' to more easily edit this page: <div style="text-align: center; margin: 1em 0">{{Clickable button 2|:toollabs:schedule-deployment|Schedule a backport|class=mw-ui-progressive}}</div> * Tasks that meet [[/Inclusion criteria|Inclusion criteria]] '''require their own windows''', which includes long-running tasks. '''Schedule more time''' than you think you need to account for delays and set backs, we recommend one hour for most tasks. **To create or modify a recurring deploy window, send a patchset to [[:gitlab:repos/releng/release/-/blob/main/make-deployment-calendar/deployments-calendar.yaml|deployments-calendar.yaml file]] in <code>repos/releng/release.git</code>. **To create an one-off window, simply edit this page accordingly ** '''Announce''' changes to the [[mail:ops|ops mailing list]] ahead of time if you anticipate or are uncertain about noticeable impacts to database load, HTTP caching, or the introduction of new cookies. ** '''Announce''' deployments of major features to the community via [[:m:Tech/News/Next|Tech News]] and/or via other [[:mw:Wikimedia_Product_Guidance/Communication_channels|Product communication channels]]. * '''Something went wrong?''' See [[Incident response]]. Is there a user-impacting problem? Communicate in the {{irc|wikimedia-operations}} IRC channel. If there is a Phabricator task, ensure [[:phab:tag/wikimedia-incident/|#Wikimedia-Incident]] is tagged, and consider setting the [[:mw:Phabricator/Project_management#Priority_levels|Unbreak Now]] priority. __TOC__ {{anchor|Next Week|Near Term|Near term|Near-term}}{{clear}} [[Category:Deployment]] {{Note|content=Subscribe in Google Calendar via <code>wikimedia.org_rudis09ii2mm5fk4hgdjeh1u64@group.calendar.google.com</code>.<br>This may not include one-off windows. '''If there are differences, then the wiki page is canonical and correct'''.}} ==Week of April 06== ==={{Deployment_day|date=2026-04-05}}=== {{Deployment calendar event card |when=2026-04-05 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-04-06}}=== {{Deployment calendar event card |when=2026-04-06 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-06 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-04-06 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-04-06 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-06 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-04-06 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-06 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.23</code> }} {{Deployment calendar event card |when=2026-04-06 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.23</code> to testwikis }} {{Deployment calendar event card |when=2026-04-06 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-04-06 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-06 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-07}}=== {{Deployment calendar event card |when=2026-04-07 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-07 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-07 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-04-07 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-07 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-04-07 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-07 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-07 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22|1.46.0-wmf.22}} * group0 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-07 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|hyang|hyang}} {{deploy|type=config|gerrit=1264856|title=REST: Publish ReadingLists v0 module in REST Sandbox|status=}} - {{phabricator|T419619}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-07 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-07 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-08}}=== {{Deployment calendar event card |when=2026-04-08 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22}} * group1 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-08 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-04-08 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-08 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-08 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23|1.46.0-wmf.22}} * group1 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-08 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-08 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-08 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-08 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-08 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-09}}=== {{Deployment calendar event card |when=2026-04-09 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 01:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version (secondary timeslot) |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23}} * group2 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-09 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-09 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-09 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-09 08:00 SF |length=1 |window=Train log triage |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-04-09 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-09 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-04-09 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-09 11:00 SF |length=2 |window=MediaWiki train - Utc-7+Utc-0 Version |who={{ircnick|dancy|Ahmon}}, {{ircnick|jnuche|Jaime}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23|1.46.0-wmf.23|1.46.0-wmf.22->1.46.0-wmf.23}} * group2 to [[mw:MediaWiki_1.46/wmf.23|1.46.0-wmf.23]] * '''Blockers: {{phabricator|T420481}}''' }} {{Deployment calendar event card |when=2026-04-09 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-09 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-09 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-10}}=== {{Deployment calendar event card |when=2026-04-10 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-04-10 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-04-11}}=== {{Deployment calendar event card |when=2026-04-11 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of April 13== ==={{Deployment_day|date=2026-04-12}}=== {{Deployment calendar event card |when=2026-04-12 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-04-13}}=== {{Deployment calendar event card |when=2026-04-13 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-13 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-13 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-13 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-04-13 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-04-13 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-13 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-04-13 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-13 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-04-13 16:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-13 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.24</code> }} {{Deployment calendar event card |when=2026-04-13 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.24</code> to testwikis }} {{Deployment calendar event card |when=2026-04-13 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-04-13 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-13 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-14}}=== {{Deployment calendar event card |when=2026-04-14 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-14 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-14 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-14 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-14 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-04-14 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-04-14 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-04-14 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-14 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-14 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|dduvall|Dan}}, {{ircnick|dancy|Ahmon}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.23->1.46.0-wmf.24|1.46.0-wmf.23|1.46.0-wmf.23}} * group0 to [[mw:MediaWiki_1.46/wmf.24|1.46.0-wmf.24]] * '''Blockers: {{phabricator|T420482}}''' }} {{Deployment calendar event card |when=2026-04-14 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-14 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-14 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-15}}=== {{Deployment calendar event card |when=2026-04-15 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-15 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-15 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-04-15 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-15 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-15 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-04-15 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-15 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|dduvall|Dan}}, {{ircnick|dancy|Ahmon}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.24|1.46.0-wmf.23->1.46.0-wmf.24|1.46.0-wmf.23}} * group1 to [[mw:MediaWiki_1.46/wmf.24|1.46.0-wmf.24]] * '''Blockers: {{phabricator|T420482}}''' }} {{Deployment calendar event card |when=2026-04-15 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-15 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-15 15:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-15 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-15 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-16}}=== {{Deployment calendar event card |when=2026-04-16 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-16 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-16 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-16 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-16 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [[Test Kitchen]]. }} {{Deployment calendar event card |when=2026-04-16 08:00 SF |length=1 |window=Train log triage |who={{ircnick|dduvall|Dan}}, {{ircnick|dancy|Ahmon}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-04-16 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-16 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-04-16 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-16 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|dduvall|Dan}}, {{ircnick|dancy|Ahmon}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.24|1.46.0-wmf.24|1.46.0-wmf.23->1.46.0-wmf.24}} * group2 to [[mw:MediaWiki_1.46/wmf.24|1.46.0-wmf.24]] * '''Blockers: {{phabricator|T420482}}''' }} {{Deployment calendar event card |when=2026-04-16 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-16 14:00 SF |length=1 |window=Readers deployment window |who=Readers |what=NOTE: often skipped, the reader teams do not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-16 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-17}}=== {{Deployment calendar event card |when=2026-04-17 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-04-17 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-04-18}}=== {{Deployment calendar event card |when=2026-04-18 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} aa8llzdqymb5m17hqbk7d55tsju6dva Server Admin Log 0 7919 2398784 2398780 2026-04-03T12:16:06Z Stashbot 7414 jclark@cumin1003: START - Cookbook sre.dns.netbox 2398784 wikitext text/x-wiki == 2026-04-03 == * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> df9o6fl308upt0ezdu9lufedm5vk3cc 2398785 2398784 2026-04-03T12:22:34Z Stashbot 7414 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" 2398785 wikitext text/x-wiki == 2026-04-03 == * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 5by6hrczjrst4afbresjnz092ysbfl9 2398786 2398785 2026-04-03T12:22:39Z Stashbot 7414 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" 2398786 wikitext text/x-wiki == 2026-04-03 == * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> aw4peydcq6wz5heao7f6b33g07h708r 2398787 2398786 2026-04-03T12:22:40Z Stashbot 7414 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2398787 wikitext text/x-wiki == 2026-04-03 == * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> nygmyaxlusekyb7o1mkcl202enlzpul 2398789 2398787 2026-04-03T13:03:51Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.dns.netbox 2398789 wikitext text/x-wiki == 2026-04-03 == * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> gx6829045051790odn0hzmhutsvuw5v 2398790 2398789 2026-04-03T13:08:07Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" 2398790 wikitext text/x-wiki == 2026-04-03 == * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> abdw9s9yilepp34orlpvvaslko3tx88 2398791 2398790 2026-04-03T13:08:12Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" 2398791 wikitext text/x-wiki == 2026-04-03 == * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 7044uajvezjad0mqlw7vjm4gws4pbog 2398792 2398791 2026-04-03T13:08:13Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2398792 wikitext text/x-wiki == 2026-04-03 == * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3q92xh64s284b6k82zj312wtsh8t0rf 2398793 2398792 2026-04-03T13:22:23Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.dns.netbox 2398793 wikitext text/x-wiki == 2026-04-03 == * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3aonyidhaga37p1ed2ox1u302o9ivj6 2398794 2398793 2026-04-03T13:25:09Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2398794 wikitext text/x-wiki == 2026-04-03 == * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> lr3lsoc24s3iw276g2nor6xd5gwrudh 2398795 2398794 2026-04-03T13:26:38Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 2398795 wikitext text/x-wiki == 2026-04-03 == * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 8yflhwse7153dht2q29pkrl3m37void 2398796 2398795 2026-04-03T13:26:57Z Stashbot 7414 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 2398796 wikitext text/x-wiki == 2026-04-03 == * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 1rxrf8mgrbmymr7e292daijre0t768v 2398797 2398796 2026-04-03T13:27:50Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398797 wikitext text/x-wiki == 2026-04-03 == * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 50dif25pm1xk46tlh97dyalvyfcvmtl 2398798 2398797 2026-04-03T13:34:33Z Stashbot 7414 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398798 wikitext text/x-wiki == 2026-04-03 == * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 5q90tfn59esnstdss26nd116jxjpnxu 2398799 2398798 2026-04-03T13:38:24Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398799 wikitext text/x-wiki == 2026-04-03 == * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> fj9hn8ml3f8vzxxu1jbxjveqy1jjnj0 2398800 2398799 2026-04-03T13:38:53Z Stashbot 7414 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398800 wikitext text/x-wiki == 2026-04-03 == * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> e2cpr3tz792e83ztsbwd0nnizdezp29 2398801 2398800 2026-04-03T13:39:54Z Stashbot 7414 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398801 wikitext text/x-wiki == 2026-04-03 == * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> cg9ljvbtb1jnkxjilzdrkx5617rmvz9 2398802 2398801 2026-04-03T13:43:01Z Stashbot 7414 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED 2398802 wikitext text/x-wiki == 2026-04-03 == * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> qf1hu48mgs9bbbut8tiuxi7aqni9bnc 2398804 2398802 2026-04-03T14:40:55Z Stashbot 7414 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie 2398804 wikitext text/x-wiki == 2026-04-03 == * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3id7lp1wuqcqtepc2qp325u7hebhz1q 2398805 2398804 2026-04-03T14:52:55Z Stashbot 7414 sbassett: Deployed security mitigation for T422244 2398805 wikitext text/x-wiki == 2026-04-03 == * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> shlhjbt47yl6g83jqickzntauwmr6sz 2398806 2398805 2026-04-03T14:58:16Z Stashbot 7414 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage 2398806 wikitext text/x-wiki == 2026-04-03 == * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 6jc3brhjabgc16dngy3uyb4ure9u38t 2398807 2398806 2026-04-03T15:04:19Z Stashbot 7414 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage 2398807 wikitext text/x-wiki == 2026-04-03 == * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> mylno13j6iebyxazp0wzgzautjf4mq0 2398809 2398807 2026-04-03T15:27:30Z Stashbot 7414 jhancock@cumin2002: START - Cookbook sre.dns.netbox 2398809 wikitext text/x-wiki == 2026-04-03 == * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> h8t1dcv1m8q23ps99qkg1fvo0po2f9h 2398810 2398809 2026-04-03T15:31:25Z Stashbot 7414 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" 2398810 wikitext text/x-wiki == 2026-04-03 == * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 9wymyb33iqqt4fvw5unc14fmmgarc12 2398811 2398810 2026-04-03T15:31:31Z Stashbot 7414 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" 2398811 wikitext text/x-wiki == 2026-04-03 == * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 4fktham5kc8yhf6wwy6xh9yiwiqng7o 2398812 2398811 2026-04-03T15:31:32Z Stashbot 7414 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) 2398812 wikitext text/x-wiki == 2026-04-03 == * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> qlnmfv6pley7ddnvr85ikv5f09bt4xv 2398813 2398812 2026-04-03T15:31:48Z Stashbot 7414 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 2398813 wikitext text/x-wiki == 2026-04-03 == * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 48oie1mcowetcurkqmf5umq1njia0r4 2398814 2398813 2026-04-03T15:31:57Z Stashbot 7414 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 2398814 wikitext text/x-wiki == 2026-04-03 == * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> bqo22f943dte0xnu53yp1r9pa2ok0er 2398815 2398814 2026-04-03T15:32:01Z Stashbot 7414 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 2398815 wikitext text/x-wiki == 2026-04-03 == * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> o2qrcm7tozkc0nps0pv2gobew8wfyrr 2398816 2398815 2026-04-03T15:32:10Z Stashbot 7414 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 2398816 wikitext text/x-wiki == 2026-04-03 == * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> etztcn0efg72emsfolagc611m24c978 2398817 2398816 2026-04-03T15:32:13Z Stashbot 7414 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 2398817 wikitext text/x-wiki == 2026-04-03 == * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 11xp34eqmoxu7kph1ira86joup6ssko 2398818 2398817 2026-04-03T15:32:24Z Stashbot 7414 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 2398818 wikitext text/x-wiki == 2026-04-03 == * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> hgebaakrj7dlgrzkplguxe52an8m0ld 2398819 2398818 2026-04-03T16:05:17Z Stashbot 7414 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie 2398819 wikitext text/x-wiki == 2026-04-03 == * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> cjdr7cnb0hq4wje4a4q9aqhirpe7nqg 2398821 2398819 2026-04-03T16:18:30Z Stashbot 7414 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie 2398821 wikitext text/x-wiki == 2026-04-03 == * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 84catj6dvzzxkc1txnytni7hvlmo2bw 2398825 2398821 2026-04-03T18:16:35Z Stashbot 7414 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie 2398825 wikitext text/x-wiki == 2026-04-03 == * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> e5jy6206irg83tmrppp1kr5ea596a95 2398826 2398825 2026-04-03T18:19:51Z Stashbot 7414 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie 2398826 wikitext text/x-wiki == 2026-04-03 == * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> lgotlwwzu04ccyp046t9esp85xmtfca 2398827 2398826 2026-04-03T18:32:37Z Stashbot 7414 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage 2398827 wikitext text/x-wiki == 2026-04-03 == * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> iy7og75107z3aecr2g4wtc046nzrrl0 2398828 2398827 2026-04-03T18:38:44Z Stashbot 7414 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply 2398828 wikitext text/x-wiki == 2026-04-03 == * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 97kqlgqcd4nwsne0xkkpj04u400z8pq 2398829 2398828 2026-04-03T18:38:49Z Stashbot 7414 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply 2398829 wikitext text/x-wiki == 2026-04-03 == * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> dqaucbdjbz00r8ubh6p7nqwbhfm00zr 2398830 2398829 2026-04-03T18:39:35Z Stashbot 7414 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage 2398830 wikitext text/x-wiki == 2026-04-03 == * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 06v8bknsfg0ll0vecjzn3e9psqzdrji 2398831 2398830 2026-04-03T18:56:26Z Stashbot 7414 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie 2398831 wikitext text/x-wiki == 2026-04-03 == * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> t8yuco5vkqn4dlgnunvdabm1x34pn53 2398832 2398831 2026-04-03T20:13:15Z Stashbot 7414 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply 2398832 wikitext text/x-wiki == 2026-04-03 == * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> p3qwdxmw9a933yh4j2mcac4c8q0qcm7 2398833 2398832 2026-04-03T20:13:19Z Stashbot 7414 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply 2398833 wikitext text/x-wiki == 2026-04-03 == * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 09l43rs1lkn5mbnwa0hz4xa5ew98w17 2398835 2398833 2026-04-03T23:48:52Z Stashbot 7414 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: T421398 2398835 wikitext text/x-wiki == 2026-04-03 == * 23:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: [[phab:T421398|T421398]] * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> jlhunwd8ehhqriisclkredgshdvxfi0 2398836 2398835 2026-04-03T23:49:06Z Stashbot 7414 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul2002.codfw.wmnet with reason: T421398 2398836 wikitext text/x-wiki == 2026-04-03 == * 23:49 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul2002.codfw.wmnet with reason: [[phab:T421398|T421398]] * 23:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: [[phab:T421398|T421398]] * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> 3khwqmu18hr68pnvi9crjnqoec2w4js 2398840 2398836 2026-04-04T02:00:48Z Stashbot 7414 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image 2398840 wikitext text/x-wiki == 2026-04-04 == * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-04-03 == * 23:49 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul2002.codfw.wmnet with reason: [[phab:T421398|T421398]] * 23:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: [[phab:T421398|T421398]] * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> m4qrwmqqd60nqpzscbywasa0bu7hwnz 2398841 2398840 2026-04-04T02:07:01Z Stashbot 7414 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 12s) 2398841 wikitext text/x-wiki == 2026-04-04 == * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 12s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image == 2026-04-03 == * 23:49 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul2002.codfw.wmnet with reason: [[phab:T421398|T421398]] * 23:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: [[phab:T421398|T421398]] * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 18:56 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon1003.eqiad.wmnet with OS trixie * 18:39 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:38 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:32 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage * 18:19 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon1003.eqiad.wmnet with OS trixie * 18:16 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafkamon2003.codfw.wmnet with OS trixie * 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie * 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2009 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2009 * 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2008 * 15:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2008 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 15:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2008 to codfw - jhancock@cumin2002" * 15:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:04 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:58 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon2003.codfw.wmnet with reason: host reimage * 14:52 sbassett: Deployed security mitigation for [[phab:T422244|T422244]] * 14:40 herron@cumin1003: START - Cookbook sre.hosts.reimage for host kafkamon2003.codfw.wmnet with OS trixie * 13:43 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:39 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:38 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:34 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:27 vriley@cumin1003: START - Cookbook sre.hosts.provision for host ganeti1058.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED * 13:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1058 * 13:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1058 * 13:25 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:22 vriley@cumin1003: START - Cookbook sre.dns.netbox * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 13:08 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:08 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [ganeti1055] - vriley@cumin1003" * 13:03 vriley@cumin1003: START - Cookbook sre.dns.netbox * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 12:22 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003" * 12:16 jclark@cumin1003: START - Cookbook sre.dns.netbox * 10:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS trixie * 10:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:32 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage * 10:30 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 10:29 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 10:19 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1010.eqiad.wmnet with OS trixie * 10:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS trixie * 09:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:54 brouberol@dns1004: END - running authdns-update * 09:52 brouberol@dns1004: START - running authdns-update * 09:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage * 09:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1009.eqiad.wmnet with OS trixie * 09:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS trixie * 09:14 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 09:07 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage * 08:51 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1008.eqiad.wmnet with OS trixie * 08:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS trixie * 08:29 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:22 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage * 08:06 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1007.eqiad.wmnet with OS trixie * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 21s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 00:59 zabe: zabe@deploy1003:~$ mwscript updateSpecialPages.php testcommonswiki # [[phab:T422062|T422062]] * 00:58 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] (duration: 06m 50s) * 00:53 zabe@deploy1003: zabe: Continuing with sync * 00:53 zabe@deploy1003: zabe: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:51 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1267286{{!}}Disable QueryPage updates for Special:Unusedtemplates on testcommonswiki (T422062)]] == 2026-04-02 == * 23:52 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 23:51 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 23:41 zabe@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] (duration: 06m 10s) * 23:37 zabe@deploy1003: zabe: Continuing with sync * 23:37 zabe@deploy1003: zabe: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 23:35 zabe@deploy1003: Started scap sync-world: Backport for [[gerrit:1264110{{!}}Start reading from new file table in dewiki and fawiki (T416548)]] * 23:06 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 23:06 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 22:20 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 22:05 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] (duration: 07m 33s) * 22:01 jdlrobson@deploy1003: jdlrobson: Continuing with sync * 22:00 jdlrobson@deploy1003: jdlrobson: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:58 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1267214{{!}}Fix section heading spacing on mobile (T414882)]] * 21:32 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] (duration: 06m 18s) * 21:28 kemayo@deploy1003: kemayo: Continuing with sync * 21:28 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:26 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:18 kemayo@deploy1003: kemayo: Continuing with sync * 21:17 kemayo@deploy1003: kemayo: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:15 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267204{{!}}SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise]] * 21:03 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] (duration: 11m 40s) * 20:59 kemayo@deploy1003: annet, kemayo: Continuing with sync * 20:53 kemayo@deploy1003: annet, kemayo: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:52 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1267008{{!}}Add logged-in reader retention instrument (T420490)]] * 20:37 kemayo@deploy1003: Finished scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] (duration: 11m 46s) * 20:31 kemayo@deploy1003: 1f616emo, kemayo: Continuing with sync * 20:29 kemayo@deploy1003: 1f616emo, kemayo: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:25 kemayo@deploy1003: Started scap sync-world: Backport for [[gerrit:1264569{{!}}zhwikinews: 20th anniversary logo change (T420165)]] * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:09 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:03 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 20:02 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:57 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:24 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: [[phab:T418109|T418109]] * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 19:01 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 18:56 cmooney@dns2005: END - running authdns-update * 18:55 cmooney@dns2005: START - running authdns-update * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 18:52 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:51 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns names for new lumen 100g transport - cmooney@cumin1003" * 18:45 cmooney@cumin1003: START - Cookbook sre.dns.netbox * 18:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:24 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs5006.eqsin.wmnet<nowiki>}</nowiki> and A:liberica * 18:00 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 18:00 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 17:59 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 17:18 swfrench@dns1004: END - running authdns-update * 17:16 swfrench@dns1004: START - running authdns-update * 17:13 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 17:12 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 17:12 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 17:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:11 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 17:11 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 17:10 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 17:08 swfrench@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 17:07 swfrench@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 17:05 swfrench@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 17:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 17:02 swfrench@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 16:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3008.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:53 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3009.esams.wmnet<nowiki>}</nowiki> and A:liberica * 16:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:09 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance * 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90247 and previous config saved to /var/cache/conftool/dbconfig/20260402-160942-fceratto.json * 16:03 Lucas_WMDE: UTC afternoon backport+config window (very belatedly) done ^^ * 16:02 swfrench@deploy1003: Finished scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] (duration: 29m 56s) * 15:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90246 and previous config saved to /var/cache/conftool/dbconfig/20260402-155934-fceratto.json * 15:51 swfrench@deploy1003: swfrench: Continuing with sync * 15:50 swfrench@deploy1003: swfrench: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P90245 and previous config saved to /var/cache/conftool/dbconfig/20260402-154925-fceratto.json * 15:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90244 and previous config saved to /var/cache/conftool/dbconfig/20260402-153918-fceratto.json * 15:33 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} (attempt 2) - [[phab:T422143|T422143]] * 15:32 moritzm: installing freetype security updates * 15:31 swfrench-wmf: restarted docker-registry-ml.service on registry200[45] - [[phab:T422166|T422166]] * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 swfrench@deploy1003: sync-world aborted: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] (duration: 26m 48s) * 15:26 swfrench-wmf: restarted docker-registry-restricted.service on registry200[45] - [[phab:T422166|T422166]] * 15:23 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 15:23 papaul: maintenance complete on mr1-eqiad * 15:22 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:11 moritzm: installing apache2 security updates * 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 15:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2227 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90242 and previous config saved to /var/cache/conftool/dbconfig/20260402-150542-fceratto.json * 15:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance * 15:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90241 and previous config saved to /var/cache/conftool/dbconfig/20260402-150517-fceratto.json * 15:03 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 15:03 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:59 swfrench@deploy1003: Started scap sync-world: Manual full-rebuild sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:59 papaul: ongoing maintenance on mr1-eqiad * 14:57 swfrench@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:56 pt1979@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-eqiad,mr1-eqiad IPv6 with reason: switching from OSFP to BGP * 14:56 swfrench@deploy1003: Started scap sync-world: Manual sync-world to pick up {{Gerrit|1267062}}, {{Gerrit|1266985}} - [[phab:T422143|T422143]] * 14:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90239 and previous config saved to /var/cache/conftool/dbconfig/20260402-145508-fceratto.json * 14:53 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:53 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:52 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 14:51 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 14:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P90237 and previous config saved to /var/cache/conftool/dbconfig/20260402-144500-fceratto.json * 14:42 moritzm: installing libxml-parser-perl security updates * 14:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90236 and previous config saved to /var/cache/conftool/dbconfig/20260402-143452-fceratto.json * 14:28 moritzm: installing pyasn1 security updates * 14:18 ladsgroup@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted * 14:17 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1267062{{!}}Bump maxConnCount]] * 14:10 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:09 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 14:03 hashar: Jenkins CI: reloading configuration from disk to poll new nodes # [[phab:T421114|T421114]] * 14:00 esanders@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/ * 14:00 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2205 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90233 and previous config saved to /var/cache/conftool/dbconfig/20260402-140004-fceratto.json * 13:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance * 13:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90232 and previous config saved to /var/cache/conftool/dbconfig/20260402-135939-fceratto.json * 13:58 esanders@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:55 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:54 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:53 lucaswerkmeister-wmde@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.21,1.46.0-wmf.22,next --multiversion-image-basename docker-registry.discovery.wmne * 13:51 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [[gerrit:1266985{{!}}Fix suggestion mode availability check (T422143)]] * 13:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90231 and previous config saved to /var/cache/conftool/dbconfig/20260402-134931-fceratto.json * 13:42 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/linked-artifacts: apply * 13:41 jasmine@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P90230 and previous config saved to /var/cache/conftool/dbconfig/20260402-133923-fceratto.json * 13:37 eevans@deploy1003: helmfile [staging] START helmfile.d/services/linked-artifacts: apply * 13:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90229 and previous config saved to /var/cache/conftool/dbconfig/20260402-132914-fceratto.json * 13:18 jasmine@cumin1003: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: maintenance - [[phab:T414486|T414486]] * 13:17 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 13:17 jasmine@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool codfw [reason: no reason specified, [[phab:T414486|T414486]]] * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2194 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90228 and previous config saved to /var/cache/conftool/dbconfig/20260402-125732-fceratto.json * 12:57 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90227 and previous config saved to /var/cache/conftool/dbconfig/20260402-125707-fceratto.json * 12:56 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:50 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool es1042: Restoring section * 12:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90225 and previous config saved to /var/cache/conftool/dbconfig/20260402-124659-fceratto.json * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS trixie * 12:45 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 12:44 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 12:41 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P90224 and previous config saved to /var/cache/conftool/dbconfig/20260402-123650-fceratto.json * 12:33 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042: Restoring section * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:32 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:32 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool es1042.eqiad.wmnet: Restoring section * 12:31 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:30 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool es1042.eqiad.wmnet: Restoring section * 12:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:28 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es1042.eqiad.wmnet * 12:28 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for es1042.eqiad.wmnet * 12:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90222 and previous config saved to /var/cache/conftool/dbconfig/20260402-122642-fceratto.json * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage * 12:22 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage * 12:13 volans@dns1004: END - running authdns-update * 12:11 volans@dns1004: START - running authdns-update * 12:11 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1374 * 12:10 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1374 * 12:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1373 * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS trixie * 12:09 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS trixie * 11:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2190 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90221 and previous config saved to /var/cache/conftool/dbconfig/20260402-115511-fceratto.json * 11:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance * 11:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90220 and previous config saved to /var/cache/conftool/dbconfig/20260402-115446-fceratto.json * 11:52 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 11:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90219 and previous config saved to /var/cache/conftool/dbconfig/20260402-114437-fceratto.json * 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P90218 and previous config saved to /var/cache/conftool/dbconfig/20260402-113429-fceratto.json * 11:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90217 and previous config saved to /var/cache/conftool/dbconfig/20260402-112421-fceratto.json * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2177 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90216 and previous config saved to /var/cache/conftool/dbconfig/20260402-105142-fceratto.json * 10:51 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90215 and previous config saved to /var/cache/conftool/dbconfig/20260402-105129-fceratto.json * 10:45 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:45 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:43 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet * 10:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90214 and previous config saved to /var/cache/conftool/dbconfig/20260402-104121-fceratto.json * 10:40 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:37 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet * 10:31 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P90213 and previous config saved to /var/cache/conftool/dbconfig/20260402-103113-fceratto.json * 10:31 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90212 and previous config saved to /var/cache/conftool/dbconfig/20260402-102105-fceratto.json * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'. * 10:19 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker-exp2001.codfw.wmnet * 10:19 moritzm: installing freetype security updates * 10:19 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'. * 10:19 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'. * 10:18 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'. * 10:18 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'. * 10:17 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:17 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'. * 10:17 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'. * 10:16 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply * 10:16 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'. * 10:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker-exp2001.codfw.wmnet * 10:14 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply * 10:12 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply * 10:10 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply * 10:05 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply * 09:48 javiermonton@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync * 09:48 javiermonton@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90211 and previous config saved to /var/cache/conftool/dbconfig/20260402-094834-fceratto.json * 09:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance * 09:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90210 and previous config saved to /var/cache/conftool/dbconfig/20260402-094808-fceratto.json * 09:48 javiermonton@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync * 09:47 javiermonton@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync * 09:45 javiermonton@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: sync * 09:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90209 and previous config saved to /var/cache/conftool/dbconfig/20260402-093759-fceratto.json * 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master-codfw * 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors * 09:29 jmm@cumin2002: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master-codfw * 09:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P90208 and previous config saved to /var/cache/conftool/dbconfig/20260402-092751-fceratto.json * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:27 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 09:19 moritzm: upgrading Envoy on the config-master servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 09:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90207 and previous config saved to /var/cache/conftool/dbconfig/20260402-091743-fceratto.json * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:55 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 08:49 moritzm: added Atsuko to the cn=ops LDAP group [[phab:T421860|T421860]] * 08:46 dpogorzelski@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: sync * 08:45 dpogorzelski@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: sync * 08:44 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2149 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90206 and previous config saved to /var/cache/conftool/dbconfig/20260402-084452-fceratto.json * 08:44 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance * 08:44 dpogorzelski@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: sync * 08:42 XioNoX: reboot mr1-esams - [[phab:T416450|T416450]] * 08:30 volans: briefly disabling puppet on P:installserver::proxy to deploy g/1266885 * 08:17 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:10 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:00 mszwarc@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] (duration: 10m 13s) * 07:56 mszwarc@deploy1003: mszwarc: Continuing with sync * 07:55 jmm@dns1004: END - running authdns-update * 07:54 jmm@dns1004: START - running authdns-update * 07:52 mszwarc@deploy1003: mszwarc: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:50 mszwarc@deploy1003: Started scap sync-world: Backport for [[gerrit:1266866{{!}}Disable external link analysis (T419837)]] * 07:47 jnuche@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] (duration: 06m 39s) * 07:47 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (, [[phab:T421714|T421714]]) xfer wdqs-all from wdqs2016.codfw.wmnet -> wdqs1027.eqiad.wmnet, repooling both afterwards * 07:43 jnuche@deploy1003: jnuche: Continuing with sync * 07:43 jnuche@deploy1003: jnuche: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:41 jnuche@deploy1003: Started scap sync-world: Backport for [[gerrit:1266861{{!}}ApiAuthManagerHelper: Accept fields with undefined label (T422027)]] * 07:38 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 07:38 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host (duration: 00m 05s) * 07:38 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fea7794]: deploy to freshly reimaged wdqs host * 07:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 64049 * 07:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 64049 * 07:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:24 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' . * 07:12 gkyziridis@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] (duration: 07m 00s) * 07:08 gkyziridis@deploy1003: gkyziridis: Continuing with sync * 07:07 gkyziridis@deploy1003: gkyziridis: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 07:05 gkyziridis@deploy1003: Started scap sync-world: Backport for [[gerrit:1266228{{!}}EventStreamConfig: Add rr-multilingual prediction_change stream (T415892)]] * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:41 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:30 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:19 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:18 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:08 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 01:06 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART * 00:56 jasmine@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART == 2026-04-01 == * 23:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1027.eqiad.wmnet with OS bullseye * 22:58 swfrench-wmf: removed unused image-suggestion service in eqiad - [[phab:T368096|T368096]] * 22:48 swfrench-wmf: removed unused image-suggestion service in codfw - [[phab:T368096|T368096]] * 22:48 jdlrobson@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] (duration: 08m 25s) * 22:43 jdlrobson@deploy1003: lmora, jdlrobson: Continuing with sync * 22:41 jdlrobson@deploy1003: lmora, jdlrobson: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:39 jdlrobson@deploy1003: Started scap sync-world: Backport for [[gerrit:1265482{{!}}Legal Footer Link Deploys (T420348)]] * 22:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] (duration: 06m 37s) * 22:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage * 22:29 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 22:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 22:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266443{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]], [[gerrit:1266442{{!}}Deferred: Fix function to get virtual domain (T421914 T398709)]] * 22:07 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=codfw * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1027 * 22:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1027 * 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1027 * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1027.eqiad.wmnet 98.32.64.10.in-addr.arpa 8.9.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 22:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1027 - bking@cumin2002" * 21:42 swfrench@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] (duration: 07m 15s) * 21:38 swfrench@deploy1003: swfrench: Continuing with sync * 21:36 swfrench@deploy1003: swfrench: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 21:35 swfrench@deploy1003: Started scap sync-world: Backport for [[gerrit:1266406{{!}}Only set the thumb step when width is given (T422074)]], [[gerrit:1266407{{!}}Only set the thumb step when width is given (T422074)]] * 21:34 bking@cumin2002: START - Cookbook sre.dns.netbox * 21:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1027 * 21:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye * 21:14 brett: Reboot lvs1013, lvs1014, lvs1015, and lvs1017 for kernel updates * 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:53 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:52 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:50 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 20:47 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:43 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P<nowiki>{</nowiki>lvs3010.esams.wmnet<nowiki>}</nowiki> and A:liberica * 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage * 20:13 cjming@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] (duration: 08m 47s) * 20:09 cjming@deploy1003: mmartorana, cjming: Continuing with sync * 20:06 cjming@deploy1003: mmartorana, cjming: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 20:04 cjming@deploy1003: Started scap sync-world: Backport for [[gerrit:1266314{{!}}config: Enable EmailConfirmationBanner on mediawikiwiki (T421366)]] * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1010 * 20:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010 * 19:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010 * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1010.eqiad.wmnet 24.48.64.10.in-addr.arpa 4.2.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 19:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1010 - bking@cumin2002" * 19:53 bking@cumin2002: START - Cookbook sre.dns.netbox * 19:53 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1010 * 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye * 18:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye * 18:11 ebysans@deploy1003: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [codfw] START helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply * 18:10 ebysans@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply * 18:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:04 aokoth@cumin1003: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet * 18:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage * 18:02 aokoth@cumin1003: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet * 18:01 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs * 17:56 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] (duration: 08m 18s) * 17:52 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 17:50 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 17:48 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1266326{{!}}Refix thumb steps for the poster image of videos (T414805)]], [[gerrit:1266327{{!}}Refix thumb steps for the poster image of videos (T414805)]] * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cloudelastic1009 * 17:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009 * 17:42 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009 * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cloudelastic1009.eqiad.wmnet 30.32.64.10.in-addr.arpa 0.3.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 17:42 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host cloudelastic1009 - bking@cumin2002" * 17:39 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 01m 53s) * 17:38 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (thin): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:37 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] (duration: 04m 15s) * 17:33 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8]: Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] [analytics/refinery@fa28ad83] * 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox * 17:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cloudelastic1009 * 17:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye * 17:32 ebysans@deploy1003: Finished deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] (duration: 01m 52s) * 17:30 ebysans@deploy1003: Started deploy [analytics/refinery@fa28ad8] (hadoop-test): Extend mediarequest Cassandra loads with poster/plays for video-requests API [[phab:T415202|T415202]] TEST [analytics/refinery@fa28ad83] * 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply * 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply * 17:18 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] (duration: 07m 25s) * 17:12 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to remove unused image-suggestion listener - [[phab:T368096|T368096]] * 17:09 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:53 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] (duration: 11m 30s) * 16:49 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync * 16:44 dreamyjazz@deploy1003: dreamyjazz: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:42 dreamyjazz@deploy1003: Started scap sync-world: Backport for [[gerrit:1266317{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]], [[gerrit:1266316{{!}}hCaptcha: Add log and counter when all SiteVerify attempts fail (T421678)]] * 16:36 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 16:33 fabfur@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on A:cp-drmrs - New configuration/test ([[phab:T421402|T421402]]) * 16:29 urbanecm@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] (duration: 09m 31s) * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 16:24 urbanecm@deploy1003: urbanecm: Continuing with sync * 16:21 urbanecm@deploy1003: urbanecm: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 16:19 urbanecm@deploy1003: Started scap sync-world: Backport for [[gerrit:1266309{{!}}Set the default for UserEmailConfirmationUseHTML to true (T411147)]], [[gerrit:1260011{{!}}cleanup: Remove UserEmailConfirmationUseHTML (defaults to true) (T411147)]] * 16:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host conf2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART * 16:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host conf2007 * 16:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host conf2007 * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding conf2007 to codfw - jhancock@cumin2002" * 16:09 jhancock@cumin2002: START - Cookbook sre.dns.netbox * 15:32 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 15:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer commons from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer ([[phab:T421714|T421714]], prepare newly-reimaged host) xfer wikidata from wcqs1001.eqiad.wmnet -> wcqs1003.eqiad.wmnet, repooling both afterwards * 15:13 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] (duration: 12m 53s) * 15:12 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply * 15:11 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply * 15:10 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply * 15:09 jforrester@deploy1003: jforrester: Continuing with sync * 15:03 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 15:01 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266290{{!}}Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T411807)]] * 15:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1208.eqiad.wmnet * 14:59 taavi@dns1004: END - running authdns-update * 14:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wcqs1003.eqiad.wmnet with OS bullseye * 14:57 taavi@dns1004: START - running authdns-update * 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2005.codfw.wmnet * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 14:53 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance * 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90188 and previous config saved to /var/cache/conftool/dbconfig/20260401-145256-fceratto.json * 14:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34968 * 14:52 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 34968 * 14:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2005.codfw.wmnet * 14:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker2004.codfw.wmnet * 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host db1208.eqiad.wmnet * 14:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker2004.codfw.wmnet * 14:45 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:44 fabfur: upgrading ulsfo to haproxy 3.2 ([[phab:T421402|T421402]]) * 14:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90187 and previous config saved to /var/cache/conftool/dbconfig/20260401-144247-fceratto.json * 14:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P90186 and previous config saved to /var/cache/conftool/dbconfig/20260401-143239-fceratto.json * 14:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:28 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:28 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:26 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:25 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90184 and previous config saved to /var/cache/conftool/dbconfig/20260401-142231-fceratto.json * 14:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wcqs1003.eqiad.wmnet with reason: host reimage * 14:21 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:21 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:20 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:18 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] (duration: 08m 14s) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1148.eqiad.wmnet * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:13 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:13 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1148.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003" * 14:12 jforrester@deploy1003: jforrester: Continuing with sync * 14:11 volans: uploaded cumin_6.0.0 to apt.wikimedia.org bookworm-wikimedia,trixie-wikimedia * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:11 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:10 jforrester@deploy1003: jforrester: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:09 brouberol@cumin1003: START - Cookbook sre.dns.netbox * 14:08 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:08 jforrester@deploy1003: Started scap sync-world: Backport for [[gerrit:1266190{{!}}MemcachedWrapper: Hash key when longer than 250 characters]], [[gerrit:1266219{{!}}Extend queue processing times for abstract fragments (T421581)]] * 14:07 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1259 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90182 and previous config saved to /var/cache/conftool/dbconfig/20260401-140707-fceratto.json * 14:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance * 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90181 and previous config saved to /var/cache/conftool/dbconfig/20260401-140654-fceratto.json * 14:05 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply * 14:05 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wcqs1003 * 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wcqs1003 * 14:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply * 14:04 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wcqs1003 * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wcqs1003.eqiad.wmnet 9.32.64.10.in-addr.arpa 9.0.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 14:04 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wcqs1003 - bking@cumin2002" * 14:03 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1148.eqiad.wmnet * 14:03 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply * 14:02 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply * 14:00 bking@cumin2002: START - Cookbook sre.dns.netbox * 14:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wcqs1003 * 13:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wcqs1003.eqiad.wmnet with OS bullseye * 13:56 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90179 and previous config saved to /var/cache/conftool/dbconfig/20260401-135646-fceratto.json * 13:51 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-lab1002.eqiad.wmnet * 13:51 klausman@cumin1003: START - Cookbook sre.hosts.remove-downtime for ml-lab1002.eqiad.wmnet * 13:50 klausman@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet * 13:46 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 13:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P90178 and previous config saved to /var/cache/conftool/dbconfig/20260401-134638-fceratto.json * 13:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART * 13:41 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 13:41 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 13:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90177 and previous config saved to /var/cache/conftool/dbconfig/20260401-133629-fceratto.json * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:33 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:30 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:29 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 13:22 moritzm: purge prometheus-nginx-exporter from url downloaders, remnants of early hcapcha rollout * 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1254 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90176 and previous config saved to /var/cache/conftool/dbconfig/20260401-132149-fceratto.json * 13:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance * 13:21 fabfur: upgrading magru to haproxy 3.2 ([[phab:T421402|T421402]]) * 13:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:19 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru - 3.2 upgrade ([[phab:T421402|T421402]]) * 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org * 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance * 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90174 and previous config saved to /var/cache/conftool/dbconfig/20260401-130753-fceratto.json * 13:06 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org * 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org * 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90173 and previous config saved to /var/cache/conftool/dbconfig/20260401-125744-fceratto.json * 12:56 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] (duration: 09m 21s) * 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org * 12:52 kharlan@deploy1003: kharlan: Continuing with sync * 12:49 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P90171 and previous config saved to /var/cache/conftool/dbconfig/20260401-124736-fceratto.json * 12:47 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266227{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]], [[gerrit:1266226{{!}}hCaptcha: Retry SiteVerify API on HTTP error and adjust timeout (T421678)]] * 12:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90170 and previous config saved to /var/cache/conftool/dbconfig/20260401-123728-fceratto.json * 12:33 kharlan@deploy1003: Finished scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] (duration: 07m 34s) * 12:29 kharlan@deploy1003: kharlan: Continuing with sync * 12:28 kharlan@deploy1003: kharlan: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 12:26 kharlan@deploy1003: Started scap sync-world: Backport for [[gerrit:1266223{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]], [[gerrit:1266222{{!}}Revert "SuggestedInvestigations: Import session into signal matching job" (T421062)]] * 12:22 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1233 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90169 and previous config saved to /var/cache/conftool/dbconfig/20260401-122203-fceratto.json * 12:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance * 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90168 and previous config saved to /var/cache/conftool/dbconfig/20260401-122138-fceratto.json * 12:17 kart_: Updated cxserver to 2026-03-25-072715-production * 12:17 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet * 12:13 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply * 12:12 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply * 12:11 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply * 12:11 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet * 12:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90167 and previous config saved to /var/cache/conftool/dbconfig/20260401-121130-fceratto.json * 12:02 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply * 12:02 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply * 12:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P90166 and previous config saved to /var/cache/conftool/dbconfig/20260401-120122-fceratto.json * 11:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90164 and previous config saved to /var/cache/conftool/dbconfig/20260401-115114-fceratto.json * 11:48 moritzm: upgrading Envoy on the idp-test servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 11:35 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1229 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90162 and previous config saved to /var/cache/conftool/dbconfig/20260401-113556-fceratto.json * 11:35 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance * 11:33 moritzm: installing tomcat10 security updates * 11:27 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 11:27 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 11:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance * 11:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90161 and previous config saved to /var/cache/conftool/dbconfig/20260401-112125-fceratto.json * 11:18 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply * 11:17 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply * 11:17 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 11:16 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 11:12 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply * 11:11 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply * 11:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90160 and previous config saved to /var/cache/conftool/dbconfig/20260401-111117-fceratto.json * 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply * 11:09 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply * 11:06 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 11:05 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 11:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P90159 and previous config saved to /var/cache/conftool/dbconfig/20260401-110109-fceratto.json * 10:58 daniel@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply * 10:58 daniel@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply * 10:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90158 and previous config saved to /var/cache/conftool/dbconfig/20260401-105059-fceratto.json * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1197 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90157 and previous config saved to /var/cache/conftool/dbconfig/20260401-104847-fceratto.json * 10:48 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance * 10:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90156 and previous config saved to /var/cache/conftool/dbconfig/20260401-104823-fceratto.json * 10:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:47 moritzm: installing libpng1.6 security updates on Trixie/Bookworm * 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1004.eqiad.wmnet with OS bookworm * 10:47 daniel@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply * 10:46 daniel@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply * 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org * 10:41 daniel@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply * 10:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org * 10:40 daniel@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply * 10:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90154 and previous config saved to /var/cache/conftool/dbconfig/20260401-103816-fceratto.json * 10:36 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1370.eqiad.wmnet with reason: host reimage * 10:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P90152 and previous config saved to /var/cache/conftool/dbconfig/20260401-102807-fceratto.json * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1370 * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1370.eqiad.wmnet 204.48.64.10.in-addr.arpa 4.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 10:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1370 - ayounsi@cumin1003" * 10:19 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1370 * 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1370.eqiad.wmnet with OS trixie * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90151 and previous config saved to /var/cache/conftool/dbconfig/20260401-101758-fceratto.json * 10:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1004.eqiad.wmnet with reason: host reimage * 10:13 jmm@dns1004: END - running authdns-update * 10:12 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply * 10:11 jmm@dns1004: START - running authdns-update * 10:09 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:08 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/proton: apply * 10:08 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/proton: apply * 10:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1369.eqiad.wmnet with OS trixie * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 10:04 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1188 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90150 and previous config saved to /var/cache/conftool/dbconfig/20260401-095943-fceratto.json * 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance * 09:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90149 and previous config saved to /var/cache/conftool/dbconfig/20260401-095920-fceratto.json * 09:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1004 * 09:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1004 * 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1004.eqiad.wmnet with OS bookworm * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:55 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:54 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/proton: apply * 09:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/proton: apply * 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90148 and previous config saved to /var/cache/conftool/dbconfig/20260401-094912-fceratto.json * 09:44 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1369.eqiad.wmnet with reason: host reimage * 09:39 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P90147 and previous config saved to /var/cache/conftool/dbconfig/20260401-093903-fceratto.json * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1369 * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1369.eqiad.wmnet 203.48.64.10.in-addr.arpa 3.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:32 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1369 - ayounsi@cumin1003" * 09:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bookworm * 09:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90146 and previous config saved to /var/cache/conftool/dbconfig/20260401-092855-fceratto.json * 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1369 * 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1369.eqiad.wmnet with OS trixie * 09:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1368.eqiad.wmnet with OS trixie * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:21 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1182 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90143 and previous config saved to /var/cache/conftool/dbconfig/20260401-091134-fceratto.json * 09:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance * 09:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90142 and previous config saved to /var/cache/conftool/dbconfig/20260401-091109-fceratto.json * 09:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 09:05 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1368.eqiad.wmnet with reason: host reimage * 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 09:01 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90141 and previous config saved to /var/cache/conftool/dbconfig/20260401-090101-fceratto.json * 09:00 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage * 08:57 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 skin ([[phab:T406724|T406724]]) * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1368 * 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1368 * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1368.eqiad.wmnet 202.48.64.10.in-addr.arpa 2.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 08:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:52 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1368 - ayounsi@cumin1003" * 08:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P90139 and previous config saved to /var/cache/conftool/dbconfig/20260401-085053-fceratto.json * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:45 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:44 moritzm: installing Apache security updates * 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host an-druid1003 * 08:44 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host an-druid1003 * 08:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bookworm * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:42 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 08:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90138 and previous config saved to /var/cache/conftool/dbconfig/20260401-084047-fceratto.json * 08:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1162 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90137 and previous config saved to /var/cache/conftool/dbconfig/20260401-083733-fceratto.json * 08:37 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance * 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90136 and previous config saved to /var/cache/conftool/dbconfig/20260401-083709-fceratto.json * 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1014.eqiad.wmnet * 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90135 and previous config saved to /var/cache/conftool/dbconfig/20260401-082701-fceratto.json * 08:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 08:24 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1368 * 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1368.eqiad.wmnet with OS trixie * 08:21 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs [[phab:T420480|T420480]] * 08:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1014.eqiad.wmnet * 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1013.eqiad.wmnet * 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P90134 and previous config saved to /var/cache/conftool/dbconfig/20260401-081652-fceratto.json * 08:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1367.eqiad.wmnet with OS trixie * 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1013.eqiad.wmnet * 08:07 moritzm: upgrading Envoy on the Puppet servers to 1.35.9 [[phab:T419637|T419637]] [[phab:T410975|T410975]] * 08:06 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90132 and previous config saved to /var/cache/conftool/dbconfig/20260401-080644-fceratto.json * 08:03 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' . * 07:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:52 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1367.eqiad.wmnet with reason: host reimage * 07:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1156 ([[phab:T419635|T419635]])', diff saved to https://phabricator.wikimedia.org/P90130 and previous config saved to /var/cache/conftool/dbconfig/20260401-074704-fceratto.json * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance * 07:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1367 * 07:40 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1367 * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1367.eqiad.wmnet 201.48.64.10.in-addr.arpa 1.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 07:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:39 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1367 - ayounsi@cumin1003" * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:36 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:35 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 07:35 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1367 * 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1367.eqiad.wmnet with OS trixie * 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1012.eqiad.wmnet * 07:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1012.eqiad.wmnet * 07:27 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 moritzm: installing postgresql security updates * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:26 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:24 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps1011.eqiad.wmnet * 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps1011.eqiad.wmnet * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:14 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1366.eqiad.wmnet with OS trixie * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 07:09 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:50 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1366.eqiad.wmnet with reason: host reimage * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1366 * 06:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1366 * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:37 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 06:35 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1366 * 06:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1366.eqiad.wmnet 200.48.64.10.in-addr.arpa 0.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0) * 06:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:34 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1366 - ayounsi@cumin1003" * 06:30 ayounsi@cumin1003: START - Cookbook sre.dns.netbox * 06:30 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1366 * 06:29 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1366.eqiad.wmnet with OS trixie * 05:56 marostegui: Drop empty tables cusi_case, cusi_user, and cusi_signal on wikis not listed at checkuser-suggested-investigations.dblist [[phab:T421353|T421353]] * 05:33 marostegui: Drop empty ores_classification and ores_model on closed wikis [[phab:T420093|T420093]] * 05:26 marostegui: Drop global_block_whitelist on closed wikis [[phab:T420525|T420525]] * 02:24 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply * 02:14 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply * 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 23s) * 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:51 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 01:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 08m 35s) * 00:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:56 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply * 00:55 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:53 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265629{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:40 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] (duration: 12m 40s) * 00:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:29 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] synced to the testservers (see https://wikitech.wiki * 00:27 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265631{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265630{{!}}util.js: Allow passing isVectorized to adjustThumbWidthForSteps (T414805 T411013 T421589)]], [[gerrit:1265632{{!}}Pass whether the image is svg to adjustThumbWidthForSteps (T414805 T411013 T421589)]] * 00:07 ladsgroup@deploy1003: Finished scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] (duration: 06m 50s) * 00:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync * 00:03 ladsgroup@deploy1003: ladsgroup: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there. * 00:00 ladsgroup@deploy1003: Started scap sync-world: Backport for [[gerrit:1265623{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]], [[gerrit:1265624{{!}}LinksUpdate: Consolidate links virtual domains (T421914)]] == Other archives == See [[Server Admin Log/Archives]]. <noinclude> [[Category:SAL]] [[Category:Operations]] </noinclude> f0f2m19ucvo4ugnl5pji4vduek1zwct Release Engineering/SAL 0 17290 2398820 2398766 2026-04-03T16:18:08Z Stashbot 7414 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" | T421723 2398820 wikitext text/x-wiki === 2026-04-03 === * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add …ConfigManager & …UserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: […/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: […/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: […/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: […/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: […/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 36b0c31c3oopi8xne1xr8xqj5yp2xsw 2398823 2398820 2026-04-03T17:18:38Z Stashbot 7414 bd808: Unblock 2.54.128.0/19 (T422238) 2398823 wikitext text/x-wiki === 2026-04-03 === * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add …ConfigManager & …UserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: […/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: […/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: […/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: […/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: […/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> mumeci7u2g7ezk08y6zg5c88jtw5l2n 2398824 2398823 2026-04-03T17:25:38Z Stashbot 7414 bd808: Unblock 31.18.0.0/16 (T422245) 2398824 wikitext text/x-wiki === 2026-04-03 === * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add …ConfigManager & …UserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: […/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: […/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: […/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: […/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: […/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> 4xht6b3b324pwq16vfhoo9oo0fyyiuu 2398834 2398824 2026-04-03T20:17:49Z Stashbot 7414 bd808: Unblock 2.54.0.0/16 (T422238) 2398834 wikitext text/x-wiki === 2026-04-03 === * 20:17 bd808: Unblock 2.54.0.0/16 ([[phab:T422238|T422238]]) * 17:25 bd808: Unblock 31.18.0.0/16 ([[phab:T422245|T422245]]) * 17:18 bd808: Unblock 2.54.128.0/19 ([[phab:T422238|T422238]]) * 16:18 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1264649 "add Python 3.14 to pywikibot jobs and separate lint tests" {{!}} [[phab:T421723|T421723]] * 09:26 hashar: integration: nuked pywikibot/core pre-commit cache # [[phab:T422242|T422242]] * 09:15 hashar: Added Bookworm based Jenkins agents to the pool with label `Docker`. Hostnames are `integration-agent-docker-107*` # [[phab:T421114|T421114]] * 02:47 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1267398 === 2026-04-02 === * 16:50 thcipriani: restart jenkins * 15:15 bd808: Unblock 82.216.0.0/16 ([[phab:T421508|T421508]]) * 15:07 bd808: Unblock 95.90.0.0/15 ([[phab:T421485|T421485]]) * 11:19 James_F: Zuul: [oojs/ui] Drop ooui-ruby2.7-rake job, we're abandoning Ruby use there === 2026-04-01 === * 22:01 bd808: Unblock 109.144.0.0/12 ([[phab:T422019|T422019]]) * 20:16 bd808: Unblock 93.192.0.0/10 ([[phab:T421894|T421894]]) * 19:25 dancy: Updating buildkitd to v0.29.0 in gitlab-cloud-runners (prod) ([[phab:T415284|T415284]]) * 17:57 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/97 ([[phab:T420441|T420441]]) * 17:39 bd808: Unblock 94.134.0.0/15 ([[phab:T421866|T421866]]) * 16:31 dancy: Upgrade buildkit to 0.29.0 in staging gitlab-cloud-runners ([[phab:T415284|T415284]]) * 10:47 taavi: integration-castor05: free up a bit of disk space by deleting cache for AhoCorasick/ CLDRPluralRuleParser/ HtmlFormatter/ RelPath/ RunningStat/ IPSet/ === 2026-03-30 === * 22:01 bd808: Unblock 78.20.0.0/14 ([[phab:T421586|T421586]]) * 21:04 bd808: Unblock 95.88.0.0/15 ([[phab:T421774|T421774]]) * 20:49 bd808: Unblock 95.89.191.0/24 ([[phab:T421774|T421774]]) * 20:29 bd808: Unblock 73.162.0.0/16 ([[phab:T421549|T421549]]) * 13:10 hashar: gerrit: abandon mediawiki/core changes that are 2+years old and are attached to a task (`Bug: Txxxx`) * 11:37 hashar: Reloaded Zuul to to add 3 persons to the allow list * 10:43 James_F: Docker: Re-pushing to try to create quibble-coverage 1.16.0-s2 === 2026-03-27 === * 21:00 James_F: Docker: [quibble-bullseye] Drop Python 2 from images * 11:28 hashar: deployment-prep: removed block for `143.176.0.0/15` and blocked subblock `143.176.0.0/16` instead. This unblocks `143.177.0.0/16` # [[phab:T421420|T421420]] * 00:18 bd808: Unblock 95.90.238.0/23 ([[phab:T421447|T421447]]) === 2026-03-26 === * 21:25 bd808: Unblock 89.240.0.0/15 ([[phab:T421364|T421364]]) * 21:09 brennen: patchdemo: deploy to production for https://gitlab.wikimedia.org/repos/test-platform/catalyst/patchdemo/-/merge_requests/312 === 2026-03-25 === * 20:41 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256318 [[phab:T421283|T421283]] * 15:46 dancy: Migrated gitlab-cloud-runners (prod) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 15:32 dancy: Migrated gitlab-cloud-runners (staging) from nginx-ingress to traefik ([[phab:T420743|T420743]]) * 10:01 hashar: Updating tox Jenkins jobs to add support for Python 3.14 {{!}} https://gerrit.wikimedia.org/r/1260632 {{!}} [[phab:T421209|T421209]] * 08:40 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20/ === 2026-03-24 === * 19:40 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255746 * 15:34 brennen: gitlab1004: manual test run of `configure-projects` with cleared issue allowlist ([[phab:T412882|T412882]]) * 15:26 bd808: Unblock 47.194.0.0/16 ([[phab:T421127|T421127]]) * 12:53 hashar: integration: deleted old Puppet 5 compiler agents from Jenkins ( pcc-worker1014.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1015.puppet-diffs.eqiad1.wikimedia.cloud , pcc-worker1016.puppet-diffs.eqiad1.wikimedia.cloud ) # [[phab:T367399|T367399]] * 07:42 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1259755 === 2026-03-23 === * 15:28 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 90272 === 2026-03-22 === * 14:52 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1258082 * 01:00 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256488 === 2026-03-21 === * 08:10 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256962 * 07:48 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1256946 === 2026-03-20 === * 21:21 bd808: Unblock 103.159.218.0/24 ([[phab:T420530|T420530]]) * 14:59 James_F: Zuul: [mediawiki/extensions/AbuseFilter] Add dependency on CodeMirror, for [[phab:T399673|T399673]] === 2026-03-19 === * 16:54 Krinkle: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1255777 * 16:01 Krinkle: Hoist l10n-bot rights from labs/tools parent to labs parent to reduce duplication in other labs/ repos * 15:50 Krinkle: Create labs/xtools repo (branch: main, parent: labs, owner: labs-xtools), ref [[phab:T402086|T402086]] === 2026-03-18 === * 21:11 dcausse: [[phab:T403775|T403775]]: reindexing all wikis to enable new sorting options * 21:08 dcausse: restarting opensearch on deployment-cirrussearch(12{{!}}13{{!}}14) instances to pickup new plugin versions * 14:56 James_F: Zuul: Handle wmf/next the same way as wmf/branch_cut_pretest * 14:52 James_F: Zuul: [GrowthExperiments] drop duplicate VisualEditor dep * 14:52 James_F: Zuul: [search/*] Add experimental Java 25 jobs === 2026-03-17 === * 22:50 James_F: Zuul: [mediawiki/extensions/JsonForms] Add quibble jobs * 21:27 James_F: Zuul: search: Update opensearch plugins for Java 11/17, for [[phab:T420407|T420407]] * 20:20 bd808: Resize deployment-sessionstore06 from g4.cores1.ram2.disk20 to g4.cores2.ram4.disk20 ([[phab:T415021|T415021]]) * 16:43 James_F: Zuul: [BlueSpicePermissionManager] Add …ConfigManager & …UserManager deps * 14:36 James_F: Zuul: [mediawiki/extensions/ArticleGuidance]: Add SpamBlacklist as phan dep, for [[phab:T420015|T420015]] === 2026-03-13 === * 13:59 andrewbogott: deleting ptr record 117.0.16.172.in-addr.arpa. -- accidental duplicate for deployment-kafka-logging01.deployment-prep.eqiad1.wikimedia.cloud * 13:04 elukey: re-create kafka-logging-01 in deployment-prep on trixie and Kafka 3.7 (was running on buster) * 09:13 elukey: upgrade kafka-jumbo and kafka-main to Confluent 7.7 in deployment-prep (pre-requisite before being able to upgrade to Trixie) === 2026-03-12 === * 21:23 bd808: Hard reboot deployment-sessionstore06 ([[phab:T415021|T415021]]) * 01:14 James_F: Docker: [helm-linter] Bump for Envoy 1.35.9, for [[phab:T419637|T419637]] === 2026-03-11 === * 16:48 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/MetricsPlatform # [[phab:T417568|T417568]] * 16:47 James_F: Zuul: [mediawiki/extensions/MetricsPlatform] Archive, for [[phab:T416865|T416865]] * 11:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1250529 "inference-services: Split policy violation CI into separate model jobs." - [[phab:T418832|T418832]] === 2026-03-10 === * 17:39 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner production * 17:11 hashar: Updated MediaWiki coverage jobs so that they now keep "Generate a local configuration by running `composer phpunit:config`" message # [[phab:T419073|T419073]] * 16:41 dduvall: deployed reggie v1.18.0 to gitlab-cloud-runner staging * 08:21 codders: integration: integration-castor05: rm -fR /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 === 2026-03-09 === * 21:53 bd808: Reboot deployment-shellbox01 on the off chance that is makes the new permissions error go away ([[phab:T419440|T419440]]) * 13:13 James_F: Zuul: [mediawiki/extensions/WikiShare] Mark as archived, for [[phab:T413589|T413589]] * 13:11 James_F: Zuul: [mediawiki/extensions/Memento] Mark as archived, for [[phab:T369991|T369991]] * 13:10 James_F: Zuul: [mediawiki/extensions/QuickGV] Mark as archived, for [[phab:T413348|T413348]] * 13:10 James_F: Zuul: [mediawiki/extensions/SemanticImageInput] Mark as archived, for [[phab:T413588|T413588]] * 13:09 James_F: Zuul: [mediawiki/extensions/SidebarDonateBox] Mark as archived, for [[phab:T413587|T413587]] * 13:07 James_F: Zuul: [mediawiki/extensions/SemanticSifter] Mark as archived, for [[phab:T413586|T413586]] * 13:06 James_F: Zuul: [mediawiki/extensions/GoogleAdSense] Mark as archived, for [[phab:T413585|T413585]] * 13:04 James_F: Zuul: [mediawiki/extensions/SecurityAPI] Mark as archived, for [[phab:T418008|T418008]] * 12:50 James_F: Zuul: [mediawiki/extensions/CheckUser] Add DiscussionTools dependency * 12:50 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add dependencies for TestKitchen * 10:40 hashar: gerrit: mediawiki/vendor: converted `es6` and `es710` branches to tags # [[phab:T417804|T417804]] * 09:24 hashar: Updating Quibble jobs to 1.16.0 {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1248880 {{!}} [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 09:15 hashar: updating all CI Jenkins jobs using `./jjb-update` === 2026-03-06 === * 19:46 James_F: Zuul: [mediawiki/services/geoshapes] Mark as archived, for [[phab:T418372|T418372]] * 16:37 hashar: Building Docker images for Quibble 1.16.0 * 16:31 hashar: Tag Quibble 1.16.0 @ {{Gerrit|0b9db5fe3cabb2cec0b5d44e128bafa917b3b895}} # [[phab:T417399|T417399]] [[phab:T417409|T417409]] [[phab:T418461|T418461]] * 12:32 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248411 "jjb, Zuul: vary Wikibase Selenium for release branches" {{!}} [[phab:T418797|T418797]] * 12:12 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1248409/ "jjb, Zuul: rename wikibase-selenium job for clarity" {{!}} [[phab:T418797|T418797]] === 2026-03-05 === * 14:41 James_F: Zuul: [mediawiki/skins/MinervaNeue] Add TestKitchen as a dependency for [[phab:T418053|T418053]] * 08:01 hashar: Reloaded Zuul to rename wikibase-client / wikibase-repo jobs {{!}} https://gerrit.wikimedia.org/r/1238317 * 00:04 James_F: Docker: [quibble-coverage] Use local PHPUnit config, for [[phab:T345481|T345481]] === 2026-03-04 === * 21:16 James_F: Zuul: [mediawiki/core] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 21:10 James_F: Zuul: [mediawiki/vendor] Make PHP 8.5 voting on master branch, for [[phab:T411814|T411814]] * 19:48 brennen: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/96 ([[phab:T419004|T419004]]) * 18:50 James_F: Revert "Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency", for [[phab:T419043|T419043]] * 16:23 James_F: Zuul: [mediawiki/services/parsoid] Make PHP 8.4 voting * 15:37 James_F: Docker: [rake-ruby2.7] Add libffi-dev too, for [[phab:T418463|T418463]] * 13:59 James_F: Docker: [rake-ruby2.7] Add ruby-ffi for [[phab:T418463|T418463]] * 13:54 hashar: SIGKILL Zuul cause it can't gracefully stop most probably due to being locked attempting to report back to Gerrit # [[phab:T419009|T419009]] * 13:49 hashar: Stopping Zuul # [[phab:T419009|T419009]] * 13:41 hashar: Took a Zuul stack dump on contint1002.wikimedia.org using SIGUSR1 # [[phab:T419009|T419009]] === 2026-03-03 === * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaMessages] Drop MetricsPlatform phan dep * 23:52 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Drop MetricsPlatform phan dep === 2026-03-02 === * 22:13 James_F: Zuul: Enforce PHP 8.4 in MW extensions and skins for development branch, for [[phab:T386108|T386108]] * 14:05 James_F: Zuul: [mediawiki/extensions/MobileFrontend] Add ParserMigration dependency, for [[phab:T415451|T415451]] * 13:48 James_F: Zuul: […/WikimediaEvents] Drop LoginNotify dependency, now unused, for [[phab:T404334|T404334]] * 10:16 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/quibble-vendor-mysql-php83-selenium/Cypress/15.8.2/ # [[phab:T418718|T418718]] === 2026-02-28 === * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/skins` # [[phab:T418675|T418675]] * 21:33 hashar: gerrit: triggering replication to GitHub for all of `mediawiki/extensions` # [[phab:T418675|T418675]] === 2026-02-27 === * 15:53 dancy: Updating gitlab-cloud-runners (staging and prod) to gitlab-runner 18.9.0. === 2026-02-26 === * 20:16 James_F: Zuul: Provide a custom, high-priority pipeline just for puppet compiler [[phab:T414621|T414621]] * 19:32 James_F: Docker: Bump all the PHPs. * 13:40 hashar: Deployed Jenkins job https://integration.wikimedia.org/ci/job/wikibase-selenium/ # [[phab:T287582|T287582]] * 00:13 dduvall: forcing replacement of buildkitd helm release in gitlab-cloud-runner prod cluster due to dependency on removed k8s secret ([[phab:T416260|T416260]]) === 2026-02-25 === * 23:50 dduvall: deploying https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/merge_requests/552 to gitlab-cloud-runner production cluster ([[phab:T416260|T416260]]) * 14:07 James_F: Zuul: [mediawiki/extensions/CommunityRequests] Add TemplateData dependency, for [[phab:T401638|T401638]] * 00:08 jeena: no-op testing updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/95 === 2026-02-24 === * 15:55 brennen: devtools: test deploy phab/phorge to test instance ([[phab:T418256|T418256]]) === 2026-02-23 === * 23:07 jeena: Updated development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:43 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/92 * 22:12 bd808: Unblock 191.80.192.0/18 ([[phab:T418132|T418132]]) * 20:26 hashar: Deleted "replication-upstream" Grafana dashboard in favor of a copy/new "replication" one. https://grafana.wikimedia.org/d/RFLS1GsWk/replication-upstream , replaced it by https://grafana.wikimedia.org/d/d4a4da73-c27f-4ce6-a9e5-ab84dd7a4ebb/replication * 16:29 James_F: Zuul: [3d2png] Add basic Node CI at version 20 === 2026-02-20 === * 21:47 bd808: Unblock 168.184.84.0/24 ([[phab:T418020|T418020]]) * 17:13 bd808: Unblock 122.187.64.0/18 ([[phab:T417964|T417964]]) * 14:35 James_F: Zuul: [mediawiki/extensions/Monstranto] Move out of Wikimedia prod section === 2026-02-19 === * 18:34 bd808: Unblock 181.98.0.0/16 ([[phab:T417890|T417890]]) * 17:21 James_F: Zuul: [mediawiki/extensions/WikimediaEvents] Add AbuseFilter as a dependency, for [[phab:T417799|T417799]] * 13:22 hashar: Reloaded Zuul to archive the Cergen repository {{!}} https://gerrit.wikimedia.org/r/c/integration/config/+/1240688 {{!}} [[phab:T417887|T417887]] === 2026-02-18 === * 20:17 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] * 19:44 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240360 * 18:40 bd808: Unblock 46.59.0.0/17 ([[phab:T417747|T417747]]) * 17:05 hashar: Regenerating Jenkins jobs with JJB based on https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 17:04 hashar: Added EXT_DEPENDENCIES to Quibble Jenkins jobs parameters so we can manually trigger them from the Web UI using a different set of deps # https://gerrit.wikimedia.org/r/c/integration/config/+/1240254/ * 16:30 hashar: Triggered https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/ with empty Zuul parameters introduced by https://gerrit.wikimedia.org/r/1240333 {{!}} https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/4893/console * 15:43 James_F: Zuul: [mediawiki/extensions/ReadingLists] Add EventBus dependency for [[phab:T417706|T417706]] * 12:15 hashar: zuul-1001.zuul3.eqiad1.wikimedia.cloud: added keepalive=20 to the scheduler Gerrit driver and restarted scheduler container # [[phab:T417497|T417497]] * 06:58 jeena: Updating development images on contint primary for [[phab:T415922|T415922]] === 2026-02-17 === * 23:37 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240081 * 23:20 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1240078 * 15:58 brennen: deployed latest phab/phorge wmf/stable to devtools test instance ([[phab:T417657|T417657]]) * 09:01 hashar: Reloaded Zuul to enable php 8.5 testing on utfnormal, php-session-serializer, wikipeg, mediawiki/libs/Dodo, mediawiki/libs/UUID, testing-access-wrapper and translatewiki # [[phab:T406326|T406326]] === 2026-02-16 === * 15:27 hashar: Manually cleaned some old workspaces on integration-agent-docker-1042 === 2026-02-12 === * 20:07 James_F: Zuul: Enable PHP 8.5 jobs for most MW libraries, for [[phab:T406326|T406326]] * 19:33 James_F: Docker: [php83] Re-build with upstream's new 8.3.30 release and cascade * 19:31 James_F: Zuul: Add PHP 8.5 CI job to various things noted as blocked by Phan, for [[phab:T410941|T410941]], [[phab:T406326|T406326]] * 16:35 Krinkle: Disable publishing noise on tasks from repos Bcp47, clover-diff, ScopedCallback, and IDLeDOM. Ref [[phab:T143162|T143162]] * 15:53 dancy: Updating development images on contint primary for https://gitlab.wikimedia.org/repos/releng/dev-images/-/merge_requests/87 * 11:21 James_F: Zuul: [mediawiki/libs/shellbox] Add direct Phan job, for [[phab:T416064|T416064]] === 2026-02-10 === * 20:16 dancy: Rebooted k3s.catalyst-dev (it was unresponsive, but the reboot hasn't helped) === 2026-02-09 === * 21:58 James_F: Zuul: [mediawiki/tools/phan] Add PHP 8.5 CI job, for [[phab:T410941|T410941]] * 19:46 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1238006 [[phab:T415680|T415680]] * 11:51 James_F: Zuul: [mediawiki/extensions/ReadingLists] Drop MetricsPlatform dependency, for [[phab:T414435|T414435]] === 2026-02-05 === * 17:58 James_F: Zuul: […/WikimediaCustomizations] Add six new dependencies for [[phab:T404334|T404334]] * 15:35 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1237254 * 15:18 James_F: Zuul: […/OATHAuth] Add dependency and phan dependency on CentralAuth === 2026-02-04 === * 12:54 James_F: Zuul: [mediawiki/extensions/Petition] Add CLDR dependency * 10:03 hashar: Restarted Jenkins on releases2003.codfw.wmnet === 2026-02-02 === * 21:17 hashar: Reloaded Zuul for https://gerrit.wikimedia.org/r/c/integration/config/+/1234926 "re-enable master jobs for some BlueSpice repos - [[phab:T403196|T403196]]" * 21:05 bd808: Unblock 85.146.0.0/17 ([[phab:T416079|T416079]]) * 19:47 James_F: Zuul: […/WikimediaCustomizations] Add cldr phan dependency, for [[phab:T404334|T404334]] * 17:33 bd808: Unblock 188.188.0.0/15 ([[phab:T416095|T416095]]) * 17:26 bd808: Unblock 85.94.84.0/22 ([[phab:T416105|T416105]]) * 17:09 bd808: Unblock 94.234.0.0/16 ([[phab:T416165|T416165]]) * 16:51 dancy: Update gitlab-runners to alpine-v18.6.6 ([[phab:T415214|T415214]]) * 16:27 bd808: Unblock 47.231.208.0/21 ([[phab:T416010|T416010]]) * 11:39 James_F: Zuul: […/WikimediaCustomizations] Add five new phan dependencies, for [[phab:T404334|T404334]] * 09:45 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 58532, 58557 === 2026-01-31 === * 21:49 James_F: Deleted Jenkins's job entry for castor-save-workspace-cache {{Gerrit|6193776}} and this seems to have unstuck things for [[phab:T416078|T416078]]? * 21:45 James_F: Running `sudo systemctl restart jenkins` on contint for [[phab:T416078|T416078]] * 21:44 James_F: Fighting [[phab:T416078|T416078]], took integration-castor-5 offline, disconnected, sshed in to kill threads, then reconnected; no change in aspect. * 19:03 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1235380 === 2026-01-28 === * 21:26 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/WebAuthn # [[phab:T415832|T415832]] * 21:11 bd808: Unblock 181.160.0.0/15 & 186.40.128.0/17 ([[phab:T415820|T415820]]) * 17:01 bd808: Unblock 102.182.0.0/16 ([[phab:T415782|T415782]]) === 2026-01-27 === * 16:45 James_F: Zuul: Switch skin-quibble template with identical extension-quibble, for [[phab:T402398|T402398]] * 16:18 James_F: Zuul: [ArticleGuidance] mention it will be in production * 15:55 James_F: Docker: [quibble-bullseye] Update to Quibble 1.15.0 * 15:12 James_F: Docker: [quibble-coverage] Pass PHPUnit config location explicitly, for [[phab:T395470|T395470]] * 09:18 hashar: integration: on integration-castor05, deleted caches for old MediaWiki branches * 09:15 hashar: integration: on pkgbuilder instances, removed Buster cow images, aptcache and hooks. `sudo cumin --force -p 0 'name:pkgbuilder' 'rm -fR /srv/pbuilder/<nowiki>{</nowiki>base-buster-amd64.cow,hooks/buster,aptcache/buster-amd64<nowiki>}</nowiki>'` # [[phab:T397209|T397209]] * 09:14 hashar: integration: cleaned up old workspaces under /srv/jenkins/workspace === 2026-01-26 === * 23:27 bd808: Unblock 66.130.0.0/15 ([[phab:T415596|T415596]]) * 22:52 bd808: Unblock 45.16.0.0/12 ([[phab:T415467|T415467]]) * 14:46 hashar: gerrit: changed `operations/software/permissions` project type from `CODE` to `PERMISSIONS` by pointing `HEAD` to `refs/meta/config` === 2026-01-22 === * 17:36 James_F: Docker: [quibble-coverage] Stop using legacy PHPUnit entrypoint ([[phab:T395470|T395470]]) & Stop excluding Dump/ParserFuzz/Stub groups ([[phab:T415230|T415230]]) * 15:11 James_F: Zuul: [mediawiki/extensions/Math] Add a standalone job, for [[phab:T415230|T415230]] === 2026-01-20 === * 20:38 bd808: Cherry picked https://gerrit.wikimedia.org/r/c/operations/puppet/+/1229186 ([[phab:T415113|T415113]]) * 19:05 bd808: Rebooted deployment-cache-text08 to see if the mystery haproxy startup failure would go away ([[phab:T415100|T415100]]) * 18:50 bd808: Unblock 152.7.0.0/16 ([[phab:T415100|T415100]]) === 2026-01-17 === * 23:32 ori: beta-scap with `php_l10n: true` completed successfully: https://integration.wikimedia.org/ci/view/Beta/job/beta-scap-sync-world/241466/console. PHP l10n files generated. Reverted local change to scap.cfg. * 23:26 ori: Temporarily set `php_l10n: true` on deployment-deploy04:/etc/scap.cfg to see if next scap succeeds. === 2026-01-16 === * 16:33 dancy: Deleting deployment-mx03.deployment-prep ([[phab:T412975|T412975]]) === 2026-01-15 === * 14:50 James_F: jforrester@doc1004:~$ sudo -u doc-uploader rm -rf /srv/doc/cover-extensions/ArticleSummaries/ # [[phab:T413232|T413232]] === 2026-01-14 === * 17:14 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226907 * 16:27 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1226893 * 15:57 bd808: Unblock 190.60.63.0/24 ([[phab:T414541|T414541]]) === 2026-01-13 === * 15:04 James_F: Zuul: Make quibble-for-mediawiki-core-vendor-mysql-php84 voting, for [[phab:T386108|T386108]] === 2026-01-12 === * 21:33 zabe: zabe@deployment-mwmaint03:~$ foreachwiki migrateLinksTable.php --table imagelinks # [[phab:T413668|T413668]] * 21:06 bd808: Unblock 66.81.168.0/21 ([[phab:T414303|T414303]]) * 17:42 dancy: Turned off instance deployment-prep.deployment-mx03 * 11:44 Lucas_WMDE: ssh integration-castor05.integration.eqiad1.wikimedia.cloud sudo -u jenkins-deploy rm -rf /srv/castor/castor-mw-ext-and-skins/master/mediawiki-node20 # fix failure seen in mediawiki-node20 46331, 46344 === 2026-01-10 === * 21:48 taavi: reload zuul for https://gerrit.wikimedia.org/r/1224782 * 00:25 bd808: Unblock 91.160.0.0/12 ([[phab:T414190|T414190]]) === 2026-01-09 === * 17:33 thcipriani: re-enabling beta update jobs after test bad extension-list [[phab:T411516|T411516]] * 17:09 thcipriani: disabling beta update jobs to test bad extension-list [[phab:T411516|T411516]]) === 2026-01-08 === * 21:30 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224815 [[phab:T414136|T414136]] * 18:24 bd808: Unblock 89.80.0.0/12 ([[phab:T414113|T414113]]) * 15:55 dancy: Upgrading gitlab-runner to v18.5.0 on gitlab-cloud-runners. ([[phab:T414053|T414053]]) === 2026-01-07 === * 23:17 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1082574 https://gerrit.wikimedia.org/r/1224157 https://gerrit.wikimedia.org/r/1224159 * 23:12 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/896311 [[phab:T27482|T27482]] * 23:06 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1224218 * 17:34 James_F: Zuul: Add new extensions: IssueTrackerLinks, PreviewLinks, and WikiRAG * 17:34 James_F: Zuul: [labs/tools/heritage] Point to the task to drop 8.1 testing * 15:09 James_F: Zuul: [labs/tools/heritage] Add testing in PHP 8.2+, not just PHP 8.1 * 15:03 James_F: Zuul: Even for extension-broken, don't offer PHP 8.1 testing * 15:02 James_F: Zuul: Move quibble experimental sqlite/postgres tests to PHP 8.3 === 2026-01-06 === * 16:57 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223690 [[phab:T411814|T411814]] * 16:16 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1223189 [[phab:T411814|T411814]] * 00:30 bd808: Unblock 85.134.128.0/17 ([[phab:T413755|T413755]]) * 00:02 bd808: Unblock 89.166.128.0/17 ([[phab:T413702|T413702]]) === 2026-01-05 === * 23:57 bd808: Unblock 185.233.104.0/22 ([[phab:T413472|T413472]]) * 23:51 bd808: Unblock 45.62.112.0/21 ([[phab:T413079|T413079]]) * 23:44 bd808: Unblock 85.134.200.0/21 ([[phab:T413067|T413067]]) * 19:03 dancy: Updated buildkitd to v0.26.3 in gitlab-cloud-runners * 14:27 taavi: reload zuul for {{Gerrit|1223191}} * 13:57 James_F: Zuul: [mediawiki/php/wmerrors] Enable PHP 8.5 testing, for [[phab:T410921|T410921]] === 2026-01-03 === * 17:59 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222709 https://gerrit.wikimedia.org/r/1220388 https://gerrit.wikimedia.org/r/1219140 === 2026-01-02 === * 17:10 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1222597 === 2026-01-01 === * 02:34 Reedy: Reloading Zuul to deploy https://gerrit.wikimedia.org/r/1221644 <noinclude>'''Server Admin Log''' logged from {{IRC|wikimedia-releng}} for [[Nova Resource:Deployment-prep|Beta Cluster]], [[mw:Continuous integration|Continuous integration]] and various other Release Engineering projects.</noinclude> {{SAL-archives/Release Engineering}} <noinclude>[[Category:SAL]]</noinclude> gos91q23ja4pdqf8pmyog0gemm1rc79 User:Naturista2018 2 442484 2398847 2385595 2026-04-04T06:29:29Z Quiddity 1884 fix lint etc 2398847 wikitext text/x-wiki {|width="100%" align="center" |bgcolor="EBEBEB" align="center"|[[File:Nuvola_apps_edu_languages.png|30px]]'''[https://wikitech.wikimedia.org/wiki/User_talk:Naturista2018 Click here] If you want to leave a new message. Thank you!''' |} {{nowrap|1='''Current time''' <span id="purgelink" class="plainlinks" style="font-weight:normal;">('''[https://wikitech.wikimedia.org/w/index.php?title=User:Naturista2018&action=purge update]''')</span>}} '''{{CURRENTTIME}} {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} UTC, {{#time:H:i | -4 hours }} {{#time:F j Y|now|en| -4 hours }} (My Local time)''' {{userboxtop}} {{#Babel:es|en-3|ru-1|de-2|fr-2|pt-2|it-1|ar-1|bs-1|ro-1}} <div style="float: right; border:solid #999999 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; background: #eeeeee;" | style="width: 45px; height: 45px; background: {{{1|{{{id-c|#dddddd}}}}}}; text-align: center; font-size: 14pt; color: black;" | '''wiki''' | style="font-size: 8pt; padding: 4pt; line-height: 1.25em; color: black;" | This user is a '''[[w:Wikipedians|wikipedian]]'''. |}</div> {{Userbox | border-c = #E49B0F | info-fc = #000000 | info-c = white | id-c = white | id-op = border-right:1px solid #DAA520; vertical-align:bottom; | id = [[File:Wikignome crop.gif|45px]] | info = This [[w:Category:Wikipedian WikiGnomes|editor]] is a '''[[w:Wikipedia:WikiGnome|WikiGnome]]'''. | info-a = center | usercategory = Wikipedian WikiGnomes | nocat = {{{nocat|}}} }} {{top icon | imagename = Mushroom.svg | wikilink = w:Wikipedia:WikiGnome | description = This user is a WikiGnome | id = WikiGnome-icon | width = 20 | height = 20 }} {{userbox | border-c = #ccf | id = [[File:Star of life.svg|42x42px]] | id-c = #ccf | info = This user scored '''5002''' on the '''[[w:Wikipedia:Wikipediholism test|Wikipediholic test]]''' | info-c = #f8f8ff | info-lh = 1.1 | float = {{{float|left}}} }} <div style="float: left; border: solid #ffa500 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; color: #000000; background: #ffffe0;" | style="width: 45px; height: 45px; background: #ffa500; text-align: center;" | | style="font-size: 8pt; padding: 4pt; line-height: 1.25em;" |This '''[[w:Wikipedia:Wikipedians|wikipedian]]''' is '''[https://es.wikipedia.org/wiki/Naturismo naturist]''', because considers that is healthy for the '''[https://en.wikipedia.org/wiki/Outline_of_human_anatomy body]''' and the '''[https://en.wikipedia.org/wiki/Soul mind].''' |}</div> {{userbox | border-c = #6EF7A7 | id = [[File:LocationWHAmericas.png|45px]] | id-c = white | info = This user is a member of '''[[w::Wikipedia:WikiProject Americas|Wikiproject Americas]]''' | info-c = #AAEEBB | info-s = 8 }} {{userbox | border-c = black | id = [[File:Flag_of_Venezuela_(state).svg|45px]] | id-c = #F8EABA | info = This user is a member of '''[[w::Wikipedia:WikiProject Venezuela|Wikiproject Venezuela]]''' | info-c = #F8EABA | info-s = 8 }} <div style="float: left; border:solid {{{1|#CCC}}} 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; background: {{{2|#EEE}}};" | style="width: 45px; height: 45px; background: {{{1|#FFF;}}}; text-align: center; font-size: {{{5|14}}}pt; color: blue;" | '''{{{3|[[File:Wikivoyage-Logo-v3-icon.svg|40px|link=]]}}} | style="font-size: 8pt; padding: 4pt; line-height: 1.25em; color: #000000;" | Wikivoyage has a travel guide for '''[https://en.wikivoyage.org/wiki/User:Naturista2018 User:Naturista2018]''' |}</div> {{userboxbottom}} [[File:Venezuela_regions_map.png|500px|center]] [[File:Flag of Venezuela.svg|30px]] I am from '''[https://en.wikipedia.org/wiki/Venezuela Venezuela]'''. I have participated in [https://es.wikivoyage.org/wiki/Wikiviajes:Editat%C3%B3n_Wikiviajes_2018 '''Editatón Wikiviajes 2018'''] and in [https://meta.wikimedia.org/wiki/Contest:The_women_you_have_never_met '''The women you have never met'''] or [https://meta.wikimedia.org/wiki/The_Women_You_Have_Never_Met_2018 '''The women you have never met'''] and actually in [https://es.wikipedia.org/w/index.php?title=Wikipedia:PESCAR '''PESCAR'''] and in [https://wikimania2018.wikimedia.org/wiki/Special:MyLanguage/Wikimania '''WIKIMANIA 2018 CAPE TOWN'''] and in '''[https://es.wikipedia.org/wiki/Wikiproyecto:Mejora_de_art%C3%ADculos_esenciales Wikiproyecto:Mejora de artículos esenciales]''' and in '''[https://es.wikipedia.org/wiki/Wikiproyecto:Traducci%C3%B3n_de_art%C3%ADculos_destacados Wikiproyecto:Traducción de artículos destacados]''' epw28pp1zqa394bhcl3gzug1nqgzo9i 2398849 2398847 2026-04-04T08:23:18Z Quiddity 1884 indicator 2398849 wikitext text/x-wiki {|width="100%" align="center" |bgcolor="EBEBEB" align="center"|[[File:Nuvola_apps_edu_languages.png|30px]]'''[https://wikitech.wikimedia.org/wiki/User_talk:Naturista2018 Click here] If you want to leave a new message. Thank you!''' |} {{nowrap|1='''Current time''' <span id="purgelink" class="plainlinks" style="font-weight:normal;">('''[https://wikitech.wikimedia.org/w/index.php?title=User:Naturista2018&action=purge update]''')</span>}} '''{{CURRENTTIME}} {{CURRENTMONTHNAME}} {{CURRENTDAY}} {{CURRENTYEAR}} UTC, {{#time:H:i | -4 hours }} {{#time:F j Y|now|en| -4 hours }} (My Local time)''' {{userboxtop}} {{#Babel:es|en-3|ru-1|de-2|fr-2|pt-2|it-1|ar-1|bs-1|ro-1}} <div style="float: right; border:solid #999999 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; background: #eeeeee;" | style="width: 45px; height: 45px; background: {{{1|{{{id-c|#dddddd}}}}}}; text-align: center; font-size: 14pt; color: black;" | '''wiki''' | style="font-size: 8pt; padding: 4pt; line-height: 1.25em; color: black;" | This user is a '''[[w:Wikipedians|wikipedian]]'''. |}</div> {{Userbox | border-c = #E49B0F | info-fc = #000000 | info-c = white | id-c = white | id-op = border-right:1px solid #DAA520; vertical-align:bottom; | id = [[File:Wikignome crop.gif|45px]] | info = This [[w:Category:Wikipedian WikiGnomes|editor]] is a '''[[w:Wikipedia:WikiGnome|WikiGnome]]'''. | info-a = center | usercategory = Wikipedian WikiGnomes | nocat = {{{nocat|}}} }} <indicator name="This user is a WikiGnome">[[File:Mushroom.svg|20px|link=w:Wikipedia:WikiGnome]]</indicator> {{userbox | border-c = #ccf | id = [[File:Star of life.svg|42x42px]] | id-c = #ccf | info = This user scored '''5002''' on the '''[[w:Wikipedia:Wikipediholism test|Wikipediholic test]]''' | info-c = #f8f8ff | info-lh = 1.1 | float = {{{float|left}}} }} <div style="float: left; border: solid #ffa500 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; color: #000000; background: #ffffe0;" | style="width: 45px; height: 45px; background: #ffa500; text-align: center;" | | style="font-size: 8pt; padding: 4pt; line-height: 1.25em;" |This '''[[w:Wikipedia:Wikipedians|wikipedian]]''' is '''[https://es.wikipedia.org/wiki/Naturismo naturist]''', because considers that is healthy for the '''[https://en.wikipedia.org/wiki/Outline_of_human_anatomy body]''' and the '''[https://en.wikipedia.org/wiki/Soul mind].''' |}</div> {{userbox | border-c = #6EF7A7 | id = [[File:LocationWHAmericas.png|45px]] | id-c = white | info = This user is a member of '''[[w::Wikipedia:WikiProject Americas|Wikiproject Americas]]''' | info-c = #AAEEBB | info-s = 8 }} {{userbox | border-c = black | id = [[File:Flag_of_Venezuela_(state).svg|45px]] | id-c = #F8EABA | info = This user is a member of '''[[w::Wikipedia:WikiProject Venezuela|Wikiproject Venezuela]]''' | info-c = #F8EABA | info-s = 8 }} <div style="float: left; border:solid {{{1|#CCC}}} 1px; margin: 1px;"> {| cellspacing="0" style="width: 238px; background: {{{2|#EEE}}};" | style="width: 45px; height: 45px; background: {{{1|#FFF;}}}; text-align: center; font-size: {{{5|14}}}pt; color: blue;" | '''{{{3|[[File:Wikivoyage-Logo-v3-icon.svg|40px|link=]]}}} | style="font-size: 8pt; padding: 4pt; line-height: 1.25em; color: #000000;" | Wikivoyage has a travel guide for '''[https://en.wikivoyage.org/wiki/User:Naturista2018 User:Naturista2018]''' |}</div> {{userboxbottom}} [[File:Venezuela_regions_map.png|500px|center]] [[File:Flag of Venezuela.svg|30px]] I am from '''[https://en.wikipedia.org/wiki/Venezuela Venezuela]'''. I have participated in [https://es.wikivoyage.org/wiki/Wikiviajes:Editat%C3%B3n_Wikiviajes_2018 '''Editatón Wikiviajes 2018'''] and in [https://meta.wikimedia.org/wiki/Contest:The_women_you_have_never_met '''The women you have never met'''] or [https://meta.wikimedia.org/wiki/The_Women_You_Have_Never_Met_2018 '''The women you have never met'''] and actually in [https://es.wikipedia.org/w/index.php?title=Wikipedia:PESCAR '''PESCAR'''] and in [https://wikimania2018.wikimedia.org/wiki/Special:MyLanguage/Wikimania '''WIKIMANIA 2018 CAPE TOWN'''] and in '''[https://es.wikipedia.org/wiki/Wikiproyecto:Mejora_de_art%C3%ADculos_esenciales Wikiproyecto:Mejora de artículos esenciales]''' and in '''[https://es.wikipedia.org/wiki/Wikiproyecto:Traducci%C3%B3n_de_art%C3%ADculos_destacados Wikiproyecto:Traducción de artículos destacados]''' fbu3op0o9hpkt0bi8xgcr38j892m624 User talk:DCaro (WMF) 3 446778 2398808 2398781 2026-04-03T15:07:59Z DCaro (WMF) 18271 /* Webservice on Toolforge */ Reply 2398808 wikitext text/x-wiki == Welcome to Toolforge! == Hello David Caro, welcome to the Toolforge project! Your request for access was processed, and you should be able to use ssh to connect to <tt>login.toolforge.org</tt>. You will need to logout and login again at https://toolsadmin.wikimedia.org/ to activate your new permissions there. Check the [[Help:Toolforge|Toolforge help page]] for tips on using your account. You can also ask questions in our IRC channel at {{irc|wikimedia-cloud}} or send an e-mail to our mailing list <tt>cloud@lists.wikimedia.org</tt>. Thank you, and have fun making Tools! --[[User:StrikerBot|StrikerBot]] ([[User talk:StrikerBot|talk]]) 13:53, 2 November 2020 (UTC) == Wikimedia Hackathon Northwestern Europe 2026 == Hello! I came across your name on a previous Wikimedia hackathon participant page, so I thought you might be interested in this. We're organizing the [[mw:Wikimedia Hackathon Northwestern Europe 2026|Wikimedia Hackathon Northwestern Europe 2026]], taking place on '''13–14 March 2026''' in '''Arnhem, the Netherlands'''. It's a two-day, in-person hackathon for technical Wikimedians from the region. Since you've attended a hackathon before, you already know how valuable these events can be for collaboration, learning, and getting things done together. We'd love to have you join us! [https://docs.google.com/forms/d/e/1FAIpQLSdYOnOg1iq-8M4xWw8foHUw_7fReWTKtVH_GHzGI2_ozWww9Q/viewform '''Apply here'''] – registration closes mid-January or when full. Feel free to reach out if you have any questions. Hope to see you in Arnhem! [[User:Daanvr|Daanvr]] ([[User talk:Daanvr|talk]]) 14:59, 12 January 2026 (UTC) == Webservice on Toolforge == Hello. Two years ago you helped me to transfer my webservice from grid to k8s using buildservice-created image. You advised me to switch from CGI/C#/mono to modern dotnet, having native linux support. Now I plan to do it, I created a simple dotnet app in Visual Studio, but what to do next, how to run it on Toolforge? Two years ago you say it should be an image, created by buildservice, too, like my old code. Could you explain how should I built a new image and run it on Toolforge? My current building scripts are here: https://github.com/Saisengen/wikibots/tree/main (you have wrote them). [[User:MBH|MBH]] ([[User talk:MBH|talk]]) 12:46, 1 April 2026 (UTC) :Hi @[[User:MBH|MBH]]! I'm a bit busy right now. You can create the new code in a new repository, and use [[Help:Toolforge/My first .NET tool]] to build + deploy it, when building, you can also give the image a different name, like <nowiki><code>toolforge build start --image-name my-new-web </nowiki>https://gitlab.wikimedia.org/toolforge-repos/sample-dotnet-buildpack-app.git&#x3C;/code>. I can try to give it a look eventually, but I can't commit to anything soon. [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:27, 1 April 2026 (UTC) :: Thanks. I created a repo (https://github.com/Saisengen/webservice/tree/master) and successfully built it, now I have one more issue. When I got 500 server error on Toolforge (when debugging locally on my PC, there is no this issue), error message doesn't written into ''error.log'' file on Toolforge filesystem, unlike early behavior. Where could I read error messages? [[User:MBH|MBH]] ([[User talk:MBH|talk]]) 11:15, 3 April 2026 (UTC) :::You can try <nowiki><code>toolforge we service logs</code></nowiki> (note that if the logs are older than 1h, you'll have to pass <nowiki><code>--since 5h</code></nowiki> for example) [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:07, 3 April 2026 (UTC) otsret0yrzxoqzbilbqelkbrvmlbqvq 2398842 2398808 2026-04-04T04:32:06Z MBH 3865 /* Webservice on Toolforge */ reply to DCaro (WMF) ([[mw:c:Special:MyLanguage/User:JWBTH/CD|CD]]) 2398842 wikitext text/x-wiki == Welcome to Toolforge! == Hello David Caro, welcome to the Toolforge project! Your request for access was processed, and you should be able to use ssh to connect to <tt>login.toolforge.org</tt>. You will need to logout and login again at https://toolsadmin.wikimedia.org/ to activate your new permissions there. Check the [[Help:Toolforge|Toolforge help page]] for tips on using your account. You can also ask questions in our IRC channel at {{irc|wikimedia-cloud}} or send an e-mail to our mailing list <tt>cloud@lists.wikimedia.org</tt>. Thank you, and have fun making Tools! --[[User:StrikerBot|StrikerBot]] ([[User talk:StrikerBot|talk]]) 13:53, 2 November 2020 (UTC) == Wikimedia Hackathon Northwestern Europe 2026 == Hello! I came across your name on a previous Wikimedia hackathon participant page, so I thought you might be interested in this. We're organizing the [[mw:Wikimedia Hackathon Northwestern Europe 2026|Wikimedia Hackathon Northwestern Europe 2026]], taking place on '''13–14 March 2026''' in '''Arnhem, the Netherlands'''. It's a two-day, in-person hackathon for technical Wikimedians from the region. Since you've attended a hackathon before, you already know how valuable these events can be for collaboration, learning, and getting things done together. We'd love to have you join us! [https://docs.google.com/forms/d/e/1FAIpQLSdYOnOg1iq-8M4xWw8foHUw_7fReWTKtVH_GHzGI2_ozWww9Q/viewform '''Apply here'''] – registration closes mid-January or when full. Feel free to reach out if you have any questions. Hope to see you in Arnhem! [[User:Daanvr|Daanvr]] ([[User talk:Daanvr|talk]]) 14:59, 12 January 2026 (UTC) == Webservice on Toolforge == Hello. Two years ago you helped me to transfer my webservice from grid to k8s using buildservice-created image. You advised me to switch from CGI/C#/mono to modern dotnet, having native linux support. Now I plan to do it, I created a simple dotnet app in Visual Studio, but what to do next, how to run it on Toolforge? Two years ago you say it should be an image, created by buildservice, too, like my old code. Could you explain how should I built a new image and run it on Toolforge? My current building scripts are here: https://github.com/Saisengen/wikibots/tree/main (you have wrote them). [[User:MBH|MBH]] ([[User talk:MBH|talk]]) 12:46, 1 April 2026 (UTC) :Hi @[[User:MBH|MBH]]! I'm a bit busy right now. You can create the new code in a new repository, and use [[Help:Toolforge/My first .NET tool]] to build + deploy it, when building, you can also give the image a different name, like <nowiki><code>toolforge build start --image-name my-new-web </nowiki>https://gitlab.wikimedia.org/toolforge-repos/sample-dotnet-buildpack-app.git&#x3C;/code>. I can try to give it a look eventually, but I can't commit to anything soon. [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:27, 1 April 2026 (UTC) :: Thanks. I created a repo (https://github.com/Saisengen/webservice/tree/master) and successfully built it, now I have one more issue. When I got 500 server error on Toolforge (when debugging locally on my PC, there is no this issue), error message doesn't written into ''error.log'' file on Toolforge filesystem, unlike early behavior. Where could I read error messages? [[User:MBH|MBH]] ([[User talk:MBH|talk]]) 11:15, 3 April 2026 (UTC) :::You can try <nowiki><code>toolforge we service logs</code></nowiki> (note that if the logs are older than 1h, you'll have to pass <nowiki><code>--since 5h</code></nowiki> for example) [[User:DCaro (WMF)|DCaro (WMF)]] ([[User talk:DCaro (WMF)|talk]]) 15:07, 3 April 2026 (UTC) :::: ''tools.mbh@tools-bastion-15:~$ toolforge we service logs<br> Usage: toolforge [OPTIONS] COMMAND [ARGS]...<br> Try 'toolforge --help' for help.<br> Error: No such command 'we'.'' [[User:MBH|MBH]] ([[User talk:MBH|talk]]) 04:32, 4 April 2026 (UTC) 77y6wc9v7lg8w30up55wg1zqz13dcsi User:Euglou~labswiki 2 448759 2398846 2383034 2026-04-04T05:37:09Z Quiddity 1884 rm self-link (lint error still) 2398846 wikitext text/x-wiki {{Top icon|imagename=Photostudio@1636319934678.jpg|description=user page}} qfh7cduvpfnxkr2c9gxfxdt4v2h388u 2398848 2398846 2026-04-04T08:19:57Z Quiddity 1884 indicator 2398848 wikitext text/x-wiki <indicator name="user page">[[File:Photostudio@1636319934678.jpg|20px]]</indicator> 366074s16qvfzmxoru1thrx62opy14y Map of database maintenance 0 449160 2398837 2398745 2026-04-04T00:02:20Z Dexbot 30554 Bot: Updating the report 2398837 wikitext text/x-wiki {{/Header}} == Today (2026-04-04) == == Yesterday (2026-04-03) == == Last seven days == {| class="wikitable" |+ eqiad |- ! Section !! Work |- | s2 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- | s5 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- | s6 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- |} {| class="wikitable" |+ codfw |- ! Section !! Work |- | s2 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- | s3 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- | s5 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- | s6 || [[phab:T419635|Drop il_to column from imagelinks table in wmf production (T419635)]] (fceratto) |- |} [[Category:MariaDB]] 16rk2n2ni7abvpmuyyvj61w0xg3mrbj Fundraising/techops/procedures/servers-out of band management access 0 453090 2398788 2398721 2026-04-03T12:59:39Z Jgreen (wmf) 18 /* BIOS Access */ 2398788 wikitext text/x-wiki =Server Out-of-Band Management-Access= Every Fundraising server has an out-of-band management interface. The interface is connected to a network switch, which is connected to the PFW firewall/router and configured as an isolated management subnet. The management network is accessible via tunneled SSH connection through the Fundraising bastion servers. ==Dell - DRAC== ===Connect by SSH=== * Make sure your ~/.ssh/config has Host config to route the ssh connection through the bastion server, and to set the correct user. <pre> Host *.mgmt.frack.*.wmnet User root ProxyCommand /usr/bin/ssh -q -W %h:%p frbast.wikimedia.org </pre> * Then: <pre> ssh {host}.mgmt.frack.{dc}.wmnet Password: <first password is bastion SSH 2FA> Password: <second password is management password> </pre> ===Useful Commands/Operations=== <pre> racadm racdump Display Hardware Information and Configuration racadm getsysinfo Display System Information (including host IPs) racadm serveraction <action> Control Server Power (powerup|powerdown|powercycle) </pre> * Change DRAC password (one or the other, depending on DRAC version) racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 2 <thepassword> racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 1 <thepassword> racadm set iDRAC.Users.2.Password <thepassword> * Set to PXE boot on the next restart (newer hosts): racadm set iDRAC.ServerBoot.FirstBootDevice PXE * Set to PXE boot on the next restart (older hosts): racadm config -g cfgServerInfo -o cfgServerBootOnce 1 racadm config -g cfgServerInfo -o cfgServerFirstBootDevice PXE * Configure Network Interface (usually handled by dcops): racadm setniccfg -s 10.64.40.199 255.255.255.192 10.64.40.193 * Command and Hotkeys to Enter/Exit console or BIOS: enter console: console com2 exit console: ^\ enter bios: <esc>2 start pxeboot: ^<esc><2> * UEFI boot settings <pre> racadm set BIOS.BiosBootSettings.BootMode Uefi racadm jobqueue create BIOS.Setup.1-1 -r pwrcycle -s TIME_NOW (wait for the box reboot and update its settings, then...) racadm set BIOS.NetworkSettings.HttpDev1EnDis Disabled racadm set BIOS.NetworkSettings.HttpDev2EnDis Disabled racadm set BIOS.NetworkSettings.HttpDev3EnDis Disabled racadm set BIOS.NetworkSettings.HttpDev4EnDis Disabled racadm set BIOS.NetworkSettings.PxeDev1EnDis Enabled racadm set BIOS.NetworkSettings.PxeDev2EnDis Disabled racadm set BIOS.NetworkSettings.PxeDev3EnDis Disabled racadm set BIOS.NetworkSettings.PxeDev4EnDis Disabled (this may not work, you may have to go into BIOS and set it there) racadm set BIOS.NetworkSettings.UefiPxeInd NIC.Integrated.1-1-1 Enabled racadm set BIOS.BiosBootSettings.UefiBootSeq HardDisk.List.1-1,NIC.Integrated.1-1-1 racadm jobqueue create BIOS.Setup.1-1 -r pwrcycle -s TIME_NOW </pre> * Serial Console settings <pre> racadm set BIOS.SerialCommSettings.SerialComm OnConRedir racadm set BIOS.SerialCommSettings.SerialPortAddress Com2 racadm set BIOS.SerialCommSettings.RedirAfterBoot Enabled racadm jobqueue create BIOS.Setup.1-1 -r pwrcycle -s TIME_NOW </pre> * Other/General if not already done * disable pxeboot for "embedded" NICs * enable pxeboot for first "integrated" NIC ==Supermicro== ===Connect by IPMI=== * ipmitool is installed on the bastion servers * mpi is a wrapper script to simplify commands, see: mpi -h * if using ipmitool directly, remember -I and -E (see man page) and don't put the password in the command * optionally set the password as an environment variable, this works for mpi and ipmitool: <pre> read -s -p "IPMI Password: " IPMI_PW export IPMI_PASSWORD=$IPMI_PW </pre> ===Serial Over LAN (sol)=== * for console access to BIOS and redirected COM ports * sol hijacks keyboard input, normal shell escapes won't work * to exit sol, <enter> then wait a second and ~~. * if this ^^^ doesn't work, repeat and/or close your terminal locally * if arrow keys etc stop working, exit/kill the session and 'reset' in terminal * to close a lingering sol session run: mpi payments1010 sol deactivate ===BIOS Access=== <pre> mpi <short hostname> chassis bootdev bios mpi <short hostname> chassis power cycle mpi <short hostname> sol activate </pre> ===BIOS Settings=== * Supermicro X13SEW-F <pre> Advanced -> Serial Port Console Redirection -> COM1 / Console Redirection: Enabled Boot -> Boot Mode Select: UEFI (following options may change after you set boot mode to UEFI and reboot) Boot -> FIXED BOOT ORDER Priorities -> Boot Option #1: UEFI Hard Disk Boot -> FIXED BOOT ORDER Priorities -> Boot Option #2: UEFI Network </pre> 728bbyaqrkamfgxepz81tnjnkf7lj4w Tool:Gitlab-account-approval/Log 116 453906 2398850 2398303 2026-04-04T11:06:12Z Gitlabaccountapprovalbot 37332 mixcc was rejected. 2398850 wikitext text/x-wiki <noinclude>'''Audit log of approvals''' made by [[gitlab:gitlabaccountapprovalbot|@gitlabaccountapprovalbot]]. __NOTOC__</noinclude> === 2026-04-04 === * 11:06 "mixcc" was rejected (pending since 2026-01-03T11:03:33.922Z). === 2026-04-02 === * 05:30 [[gitlab:mbh1|@mbh1]] was approved. === 2026-04-01 === * 18:21 "yuvrajpatil17" was rejected (pending since 2025-12-31T18:20:27.991Z). * 12:12 [[gitlab:amorii0|@amorii0]] was approved. === 2026-03-31 === * 11:00 "krrishsehgal" was rejected (pending since 2025-12-30T11:00:16.384Z). === 2026-03-30 === * 15:36 [[gitlab:atsuko|@atsuko]] was approved. === 2026-03-29 === * 11:36 [[gitlab:giftcup|@giftcup]] was approved. === 2026-03-28 === * 14:51 [[gitlab:janeeva1|@janeeva1]] was approved. === 2026-03-26 === * 13:36 [[gitlab:saiphani02|@saiphani02]] was approved. * 11:48 [[gitlab:valerioboz-wmch|@valerioboz-wmch]] was approved. === 2026-03-25 === * 09:45 "quansi" was rejected (pending since 2025-12-24T09:42:13.451Z). * 02:18 [[gitlab:viztor|@viztor]] was approved. === 2026-03-24 === * 23:18 [[gitlab:maryyann|@maryyann]] was approved. * 23:01 [[gitlab:codenamenoreste|@codenamenoreste]] was approved. * 13:36 [[gitlab:marc-maillard-wmse|@marc-maillard-wmse]] was approved. * 07:39 "fred2675" was rejected (pending since 2025-12-23T07:39:11.380Z). === 2026-03-23 === * 14:51 [[gitlab:komla|@komla]] was approved. * 05:51 "lunachuck43" was rejected (pending since 2025-12-22T05:50:17.862Z). * 04:06 "reza110011" was rejected (pending since 2025-12-22T04:05:25.117Z). === 2026-03-20 === * 21:54 "mertgor" was rejected (pending since 2025-12-19T21:51:51.419Z). * 20:57 "autanmahmah" was rejected (pending since 2025-12-19T20:54:51.678Z). * 09:57 [[gitlab:nethahussain|@nethahussain]] was approved. * 09:27 [[gitlab:piewriter|@piewriter]] was approved. * 08:15 [[gitlab:dondersmooi|@dondersmooi]] was approved. === 2026-03-19 === * 21:03 "sayvhior" was rejected (pending since 2025-12-18T21:02:31.699Z). === 2026-03-18 === * 20:15 [[gitlab:martinmystere|@martinmystere]] was approved. === 2026-03-17 === * 02:51 "louperivois" was rejected (pending since 2025-12-16T02:50:48.197Z). === 2026-03-16 === * 12:54 "mokayaj857" was rejected (pending since 2025-12-15T12:53:39.015Z). * 06:18 "roamer15" was rejected (pending since 2025-12-15T06:16:38.042Z). === 2026-03-14 === * 11:12 "umaramuhammad" was rejected (pending since 2025-12-13T11:10:44.004Z). * 09:33 "akuma19" was rejected (pending since 2025-12-13T09:31:39.044Z). * 07:06 [[gitlab:syunsyunminmin|@syunsyunminmin]] was approved. === 2026-03-12 === * 20:24 [[gitlab:11wb|@11wb]] was approved. * 09:54 [[gitlab:bcxfu75k|@bcxfu75k]] was approved. === 2026-03-10 === * 09:12 [[gitlab:viktoriahillerudwmse|@viktoriahillerudwmse]] was approved. === 2026-03-06 === * 08:09 "vazhayilnewone" was rejected (pending since 2025-12-05T08:07:02.184Z). === 2026-03-04 === * 20:54 [[gitlab:elphie|@elphie]] was approved. * 11:39 "ronaldahmed" was rejected (pending since 2025-12-03T11:37:47.492Z). * 02:12 "ltslw" was rejected (pending since 2025-12-03T02:11:52.040Z). === 2026-03-02 === * 19:21 "dlopez350" was rejected (pending since 2025-12-01T19:20:38.918Z). * 18:15 [[gitlab:lsandergreen|@lsandergreen]] was approved. === 2026-03-01 === * 10:51 [[gitlab:clintacc|@clintacc]] was approved. === 2026-02-28 === * 09:24 "cardboardlamp" was rejected (pending since 2025-11-29T09:22:03.947Z). * 08:18 "wiki-pavan" was rejected (pending since 2025-11-29T08:16:24.184Z). === 2026-02-27 === * 20:45 "thisisrick25" was rejected (pending since 2025-11-28T20:42:24.454Z). === 2026-02-26 === * 13:57 "chuiimuiiofc" was rejected (pending since 2025-11-27T13:57:02.794Z). * 13:54 "steffpro" was rejected (pending since 2025-11-27T13:52:10.859Z). === 2026-02-25 === * 21:24 "abubakarhabibudayyabu" was rejected (pending since 2025-11-26T21:22:37.776Z). === 2026-02-24 === * 05:00 "playboi" was rejected (pending since 2025-11-25T05:00:30.762Z). === 2026-02-23 === * 14:00 "alph65" was rejected (pending since 2025-11-24T13:59:00.797Z). * 12:33 [[gitlab:robertsky|@robertsky]] was approved. === 2026-02-22 === * 00:30 "hp8p" was rejected (pending since 2025-11-23T00:29:24.741Z). === 2026-02-19 === * 16:45 "clayjar" was rejected (pending since 2025-11-20T16:44:48.380Z). === 2026-02-18 === * 22:18 "nexus" was rejected (pending since 2025-11-19T22:16:48.818Z). * 12:00 "bernsteinnn" was rejected (pending since 2025-11-19T11:59:04.427Z). === 2026-02-17 === * 11:36 "jason2000-cpu" was rejected (pending since 2025-11-18T11:34:00.314Z). === 2026-02-16 === * 14:54 "smaurya" was rejected (pending since 2025-11-17T14:52:06.906Z). === 2026-02-15 === * 16:51 "kra-79" was rejected (pending since 2025-11-16T16:50:41.375Z). === 2026-02-14 === * 15:15 [[gitlab:mess|@mess]] was approved. === 2026-02-13 === * 13:57 "sopalsuemae957" was rejected (pending since 2025-11-14T13:55:16.921Z). * 13:30 [[gitlab:wyslijp16-toolforge|@wyslijp16-toolforge]] was approved. === 2026-02-12 === * 16:30 "kristinagligoric" was rejected (pending since 2025-11-13T16:29:21.646Z). * 03:33 [[gitlab:anyehansen|@anyehansen]] was approved. * 02:21 [[gitlab:thejoyfultentmaker|@thejoyfultentmaker]] was approved. === 2026-02-10 === * 13:18 [[gitlab:db111|@db111]] was approved. === 2026-02-09 === * 19:06 "squirrel289" was rejected (pending since 2025-11-10T19:04:27.831Z). === 2026-02-06 === * 20:54 [[gitlab:gillux|@gillux]] was approved. * 09:09 [[gitlab:lih|@lih]] was approved. === 2026-01-31 === * 16:21 [[gitlab:taxonbot1|@taxonbot1]] was approved. === 2026-01-28 === * 14:30 [[gitlab:ademola|@ademola]] was approved. * 10:51 "watshell" was rejected (pending since 2025-10-29T10:51:01.521Z). === 2026-01-26 === * 23:06 "tavaresgmg" was rejected (pending since 2025-10-27T23:04:42.140Z). === 2026-01-25 === * 06:03 "cata" was rejected (pending since 2025-10-26T06:01:26.155Z). === 2026-01-24 === * 21:15 [[gitlab:wiegels|@wiegels]] was approved. * 06:30 [[gitlab:blaquans|@blaquans]] was approved. === 2026-01-23 === * 16:27 [[gitlab:lerickson|@lerickson]] was approved. * 10:15 "fran0035g" was rejected (pending since 2025-10-24T10:12:17.732Z). === 2026-01-22 === * 21:00 "hacksyn" was rejected (pending since 2025-10-23T20:59:15.982Z). === 2026-01-21 === * 17:30 [[gitlab:otcenas11|@otcenas11]] was approved. === 2026-01-19 === * 21:48 [[gitlab:amdrel|@amdrel]] was approved. * 04:36 "rayalexa" was rejected (pending since 2025-10-20T04:35:02.094Z). === 2026-01-18 === * 15:45 "somya" was rejected (pending since 2025-10-19T15:43:43.701Z). * 06:54 "sergg001" was rejected (pending since 2025-10-19T06:54:12.296Z). === 2026-01-16 === * 11:57 "zeejohsy" was rejected (pending since 2025-10-17T11:56:22.372Z). * 04:45 "rocky25" was rejected (pending since 2025-10-17T04:43:33.180Z). === 2026-01-15 === * 16:39 "tiisu" was rejected (pending since 2025-10-16T16:37:18.438Z). * 12:00 "noahalorwu" was rejected (pending since 2025-10-16T11:58:26.133Z). * 10:39 "prjayaiuedu" was rejected (pending since 2025-10-16T10:37:16.947Z). === 2026-01-13 === * 17:21 [[gitlab:lwilson-ctr|@lwilson-ctr]] was approved. === 2026-01-12 === * 17:03 "stagietechs" was rejected (pending since 2025-10-13T17:02:25.281Z). === 2026-01-10 === * 19:06 "keerthisr" was rejected (pending since 2025-10-11T19:05:01.758Z). === 2026-01-09 === * 20:36 "lightb" was rejected (pending since 2025-10-10T20:34:20.264Z). === 2026-01-08 === * 19:42 [[gitlab:tbodt|@tbodt]] was approved. * 13:57 [[gitlab:martynranyard|@martynranyard]] was approved. === 2026-01-07 === * 17:48 [[gitlab:santanuwiki25|@santanuwiki25]] was approved. * 14:27 "dipanshu" was rejected (pending since 2025-10-08T14:26:10.794Z). * 12:30 "adeolaadesina" was rejected (pending since 2025-10-08T12:29:49.592Z). * 09:21 "tony-kamande" was rejected (pending since 2025-10-08T09:20:28.421Z). * 06:18 "hninwuttyi" was rejected (pending since 2025-10-08T06:17:28.006Z). * 05:09 "andume" was rejected (pending since 2025-10-08T05:07:18.582Z). * 02:00 "mosope" was rejected (pending since 2025-10-08T01:59:54.800Z). * 01:15 [[gitlab:tungstalite|@tungstalite]] was approved. === 2026-01-06 === * 18:24 "leerensucher" was rejected (pending since 2025-10-07T18:21:41.253Z). * 14:54 "leonidlednev" was rejected (pending since 2025-10-07T14:53:07.273Z). * 12:57 "alexandre-tingaud" was rejected (pending since 2025-10-07T12:54:27.206Z). === 2026-01-04 === * 21:33 [[gitlab:matr1x-101|@matr1x-101]] was approved. * 15:18 "makjr" was rejected (pending since 2025-10-05T15:16:31.558Z). * 14:09 "dakshq" was rejected (pending since 2025-10-05T14:08:40.608Z). === 2026-01-03 === * 20:42 [[gitlab:apehitkey|@apehitkey]] was approved. * 18:00 [[gitlab:jeremyb|@jeremyb]] was approved. * 14:09 [[gitlab:twelephant|@twelephant]] was approved. === 2026-01-01 === * 11:30 "shellstanislav" was rejected (pending since 2025-10-02T11:29:10.150Z). === 2025-12-30 === * 19:51 "camilojdiaz" was rejected (pending since 2025-09-30T19:49:24.913Z). === 2025-12-29 === * 16:03 "zied" was rejected (pending since 2025-09-29T16:01:30.415Z). * 08:18 "rahulsidpradhan" was rejected (pending since 2025-09-29T08:17:02.849Z). === 2025-12-26 === * 09:48 "thembo42" was rejected (pending since 2025-09-26T09:45:15.033Z). === 2025-12-25 === * 14:03 "196936074751" was rejected (pending since 2025-09-25T14:02:31.367Z). === 2025-12-23 === * 16:21 "ngarnsworthy" was rejected (pending since 2025-09-23T16:20:41.211Z). === 2025-12-22 === * 12:39 "aza555" was rejected (pending since 2025-09-22T12:38:02.622Z). === 2025-12-20 === * 23:45 "saph" was rejected (pending since 2025-09-20T23:45:01.222Z). === 2025-12-19 === * 10:15 "vladdymoses" was rejected (pending since 2025-09-19T10:15:00.999Z). * 07:15 "dirtylittlepoobah" was rejected (pending since 2025-09-19T07:13:55.537Z). === 2025-12-18 === * 16:24 [[gitlab:guyfawcus|@guyfawcus]] was approved. === 2025-12-17 === * 21:39 [[gitlab:holdyourhorses|@holdyourhorses]] was approved. * 18:30 "prudencia" was rejected (pending since 2025-09-17T18:27:18.860Z). * 02:24 "lottie" was rejected (pending since 2025-09-17T02:21:21.744Z). === 2025-12-16 === * 09:39 [[gitlab:melcatherine|@melcatherine]] was approved. * 08:54 [[gitlab:leila237|@leila237]] was approved. === 2025-12-15 === * 18:27 [[gitlab:royalsailor|@royalsailor]] was approved. * 09:39 [[gitlab:olaf8940|@olaf8940]] was approved. * 09:39 "brianbybyby" was rejected (pending since 2025-09-15T09:37:45.430Z). === 2025-12-14 === * 20:21 [[gitlab:essa237|@essa237]] was approved. * 16:42 [[gitlab:bovimacoco|@bovimacoco]] was approved. === 2025-12-13 === * 21:54 "mmns21" was rejected (pending since 2025-09-13T21:52:24.017Z). * 20:33 "bugcrawler" was rejected (pending since 2025-09-13T20:31:09.211Z). === 2025-12-12 === * 14:39 "ruvchoudhary" was rejected (pending since 2025-09-12T14:36:16.167Z). * 06:54 "rezadress" was rejected (pending since 2025-09-12T06:52:21.749Z). === 2025-12-10 === * 17:30 [[gitlab:itsmoon|@itsmoon]] was approved. === 2025-12-09 === * 15:42 [[gitlab:mercy-o|@mercy-o]] was approved. === 2025-12-06 === * 16:45 "jacquesradjabu" was rejected (pending since 2025-09-06T16:45:17.969Z). * 11:27 [[gitlab:ikhitron|@ikhitron]] was approved. === 2025-12-01 === * 08:12 "halconmilenario21" was rejected (pending since 2025-09-01T08:12:10.262Z). === 2025-11-30 === * 21:06 [[gitlab:habs|@habs]] was approved. === 2025-11-29 === * 16:36 "bovimacoco" was rejected (pending since 2025-08-30T16:34:39.712Z). * 00:45 [[gitlab:jjpmaster|@jjpmaster]] was approved. === 2025-11-24 === * 10:30 "alph65" was rejected (pending since 2025-08-25T10:28:40.957Z). * 02:24 [[gitlab:yaron|@yaron]] was approved. === 2025-11-20 === * 16:06 "clayjar" was rejected (pending since 2025-08-21T16:04:54.450Z). === 2025-11-17 === * 21:09 [[gitlab:ankita97531|@ankita97531]] was approved. === 2025-11-16 === * 14:15 "commanderkefir" was rejected (pending since 2025-08-17T14:13:14.791Z). * 08:21 "rehankhan78" was rejected (pending since 2025-08-17T08:19:44.896Z). === 2025-11-15 === * 14:36 "cyberscribe" was rejected (pending since 2025-08-16T14:34:27.230Z). === 2025-11-13 === * 04:21 "waddie96" was rejected (pending since 2025-08-14T04:19:27.461Z). === 2025-11-11 === * 06:42 [[gitlab:seanhoyland|@seanhoyland]] was approved. === 2025-11-10 === * 00:06 [[gitlab:jaredblumer|@jaredblumer]] was approved. === 2025-11-09 === * 22:36 "heinxiety" was rejected (pending since 2025-08-10T22:33:12.041Z). === 2025-11-07 === * 22:00 [[gitlab:forzagreen|@forzagreen]] was approved. === 2025-11-06 === * 16:57 [[gitlab:rsilvola|@rsilvola]] was approved. === 2025-11-04 === * 21:24 [[gitlab:devdoingdev|@devdoingdev]] was approved. === 2025-11-03 === * 17:48 "joewaleed98" was rejected (pending since 2025-08-04T17:46:12.191Z). === 2025-11-01 === * 18:00 "eliasempresas" was rejected (pending since 2025-08-02T17:58:04.412Z). === 2025-10-31 === * 18:51 [[gitlab:chaoticenby|@chaoticenby]] was approved. * 04:33 "3ch310n" was rejected (pending since 2025-08-01T04:32:21.982Z). === 2025-10-30 === * 10:03 [[gitlab:tausheefhassan|@tausheefhassan]] was approved. === 2025-10-29 === * 14:54 "theap" was rejected (pending since 2025-07-30T14:52:12.066Z). === 2025-10-28 === * 06:06 [[gitlab:tanbiruzzaman|@tanbiruzzaman]] was approved. === 2025-10-27 === * 07:51 [[gitlab:jmoore111|@jmoore111]] was approved. === 2025-10-25 === * 21:09 [[gitlab:valor|@valor]] was approved. * 21:03 [[gitlab:booksmurf|@booksmurf]] was approved. * 02:48 "mystyc1" was rejected (pending since 2025-07-26T02:46:19.373Z). === 2025-10-24 === * 05:12 "aadarshmahesh" was rejected (pending since 2025-07-25T05:09:38.264Z). === 2025-10-22 === * 20:54 [[gitlab:janewanga|@janewanga]] was approved. * 17:27 "abeljeevan" was rejected (pending since 2025-07-23T17:26:46.884Z). * 16:12 "shrimpnaur" was rejected (pending since 2025-07-23T16:10:37.864Z). === 2025-10-21 === * 18:51 "jrmuizel" was rejected (pending since 2025-07-22T18:50:07.315Z). * 09:33 [[gitlab:dpogorzelski|@dpogorzelski]] was approved. === 2025-10-17 === * 13:21 [[gitlab:blegodwin|@blegodwin]] was approved. === 2025-10-16 === * 14:51 [[gitlab:bahago|@bahago]] was approved. * 14:12 "harikrishna0005" was rejected (pending since 2025-07-17T14:10:48.385Z). * 14:09 "gauthammohanraj" was rejected (pending since 2025-07-17T14:08:47.643Z). === 2025-10-15 === * 13:48 [[gitlab:adwivedii|@adwivedii]] was approved. * 13:18 [[gitlab:kimbrenekakande|@kimbrenekakande]] was approved. * 13:03 "childmnajennifer" was rejected (pending since 2025-07-16T13:01:50.236Z). * 05:06 "vssb4214" was rejected (pending since 2025-07-16T05:05:33.985Z). === 2025-10-14 === * 19:39 [[gitlab:afanyulionel|@afanyulionel]] was approved. * 15:33 [[gitlab:sadrettin|@sadrettin]] was approved. * 14:18 [[gitlab:tmwyk|@tmwyk]] was approved. * 08:42 "yasu0796" was rejected (pending since 2025-07-15T08:41:26.453Z). === 2025-10-13 === * 16:09 [[gitlab:atlas0007|@atlas0007]] was approved. === 2025-10-11 === * 17:42 [[gitlab:techwizzie|@techwizzie]] was approved. === 2025-10-10 === * 19:03 [[gitlab:miiswom|@miiswom]] was approved. * 16:06 [[gitlab:ninatakang|@ninatakang]] was approved. === 2025-10-09 === * 15:42 [[gitlab:jaykaneki|@jaykaneki]] was approved. * 14:21 [[gitlab:lebogang|@lebogang]] was approved. * 14:15 [[gitlab:kimondorose|@kimondorose]] was approved. * 13:48 [[gitlab:joyakinyi|@joyakinyi]] was approved. * 13:48 [[gitlab:dikshyashahi|@dikshyashahi]] was approved. * 13:45 [[gitlab:obediobadiah|@obediobadiah]] was approved. * 13:45 [[gitlab:system625|@system625]] was approved. * 13:45 [[gitlab:rolalove|@rolalove]] was approved. * 13:39 [[gitlab:olatundeawo|@olatundeawo]] was approved. * 13:36 [[gitlab:danielchristlight|@danielchristlight]] was approved. * 13:36 [[gitlab:dipanshu1223|@dipanshu1223]] was approved. * 13:36 [[gitlab:aradhya|@aradhya]] was approved. * 09:57 "bognd" was rejected (pending since 2025-07-10T09:55:48.661Z). === 2025-10-08 === * 23:36 [[gitlab:sopzy|@sopzy]] was approved. * 23:03 [[gitlab:oluwatumininu|@oluwatumininu]] was approved. * 19:39 [[gitlab:levon003|@levon003]] was approved. * 15:24 [[gitlab:ritika-bhambri11|@ritika-bhambri11]] was approved. * 13:45 [[gitlab:anbanguyen|@anbanguyen]] was approved. * 13:36 [[gitlab:chumzine|@chumzine]] was approved. * 13:27 [[gitlab:shr0x-ya|@shr0x-ya]] was approved. * 12:45 [[gitlab:nurahwakili|@nurahwakili]] was approved. * 03:42 "nazhiba" was rejected (pending since 2025-07-09T03:40:12.625Z). * 02:12 "mafennel" was rejected (pending since 2025-07-09T02:11:40.598Z). === 2025-10-07 === * 22:54 [[gitlab:olusegunfaj|@olusegunfaj]] was approved. * 21:30 [[gitlab:rona|@rona]] was approved. * 21:09 [[gitlab:sandijigs|@sandijigs]] was approved. * 13:36 "xisbajao" was rejected (pending since 2025-07-08T13:33:35.018Z). * 01:36 "areczek94" was rejected (pending since 2025-07-08T01:35:40.633Z). === 2025-10-06 === * 19:21 "wmcarter2017" was rejected (pending since 2025-07-07T19:21:12.899Z). === 2025-10-05 === * 14:15 "meetmendapara" was rejected (pending since 2025-07-06T14:14:16.726Z). === 2025-10-04 === * 20:51 "nftbaee" was rejected (pending since 2025-07-05T20:50:57.688Z). === 2025-10-03 === * 06:12 [[gitlab:javiermonton|@javiermonton]] was approved. === 2025-10-02 === * 20:15 "talaqalotaibipmp" was rejected (pending since 2025-07-03T20:13:05.164Z). === 2025-10-01 === * 10:54 "bjensen" was rejected (pending since 2025-07-02T10:53:46.574Z). * 02:45 "kowal1984" was rejected (pending since 2025-07-02T02:44:56.946Z). === 2025-09-30 === * 21:21 [[gitlab:kavaljeetsingh|@kavaljeetsingh]] was approved. * 00:24 "adium" was rejected (pending since 2025-07-01T00:23:43.807Z). === 2025-09-28 === * 08:54 [[gitlab:pexerik|@pexerik]] was approved. === 2025-09-27 === * 13:57 [[gitlab:rubahhitamvukova|@rubahhitamvukova]] was approved. === 2025-09-26 === * 16:57 "algorithmic" was rejected (pending since 2025-06-27T16:56:17.480Z). * 13:54 [[gitlab:shadabgdg|@shadabgdg]] was approved. * 13:12 [[gitlab:spushpit|@spushpit]] was approved. === 2025-09-20 === * 14:06 "bwiki" was rejected (pending since 2025-06-21T13:59:14.749Z). === 2025-09-16 === * 05:39 [[gitlab:deepchirp|@deepchirp]] was approved. === 2025-09-15 === * 22:00 [[gitlab:noisk8|@noisk8]] was approved. * 11:03 "ahonc" was rejected (pending since 2025-06-16T11:00:54.843Z). === 2025-09-13 === * 18:24 "a-ssh22" was rejected (pending since 2025-06-14T18:23:33.937Z). * 12:36 [[gitlab:rajashreetalukdar|@rajashreetalukdar]] was approved. * 00:45 [[gitlab:sumitsurai|@sumitsurai]] was approved. === 2025-09-12 === * 17:12 [[gitlab:suyash23|@suyash23]] was approved. * 00:46 "remotetravel" was rejected (pending since 2025-06-13T00:44:08.171Z). === 2025-09-10 === * 21:09 "jancborchardt" was rejected (pending since 2025-06-11T21:06:30.759Z). === 2025-09-09 === * 17:03 [[gitlab:vwf|@vwf]] was approved. * 06:36 [[gitlab:cactusisme|@cactusisme]] was approved. === 2025-09-08 === * 18:09 "birushandegeya" was rejected (pending since 2025-06-09T18:08:00.087Z). * 16:27 "ngarnsworthy" was rejected (pending since 2025-06-09T16:24:37.213Z). * 12:33 "zolgoyo" was rejected (pending since 2025-06-09T12:31:34.199Z). === 2025-09-06 === * 23:09 [[gitlab:jaishsingh913|@jaishsingh913]] was approved. === 2025-09-05 === * 21:45 [[gitlab:sakshi2|@sakshi2]] was approved. * 20:42 "abdukhaliq1" was rejected (pending since 2025-06-06T20:40:42.023Z). * 14:27 "beubsamy" was rejected (pending since 2025-06-06T14:27:06.781Z). === 2025-09-04 === * 23:27 "sdhehua" was rejected (pending since 2025-06-05T23:24:45.777Z). * 19:00 [[gitlab:perry|@perry]] was approved. * 11:24 "saintwolf" was rejected (pending since 2025-06-05T11:21:20.176Z). === 2025-09-02 === * 05:48 [[gitlab:aliu|@aliu]] was approved. === 2025-08-29 === * 13:30 "kksurendran066" was rejected (pending since 2025-05-30T13:27:48.755Z). === 2025-08-28 === * 22:18 "tauraamuix" was rejected (pending since 2025-05-29T22:16:08.228Z). === 2025-08-26 === * 19:03 [[gitlab:dikkulah|@dikkulah]] was approved. === 2025-08-22 === * 23:51 [[gitlab:khoroshun_mike|@khoroshun_mike]] was approved. === 2025-08-21 === * 07:39 [[gitlab:yuka|@yuka]] was approved. === 2025-08-19 === * 07:48 [[gitlab:zhaofjx|@zhaofjx]] was approved. === 2025-08-17 === * 14:27 "madhan13k" was rejected (pending since 2025-05-18T14:26:08.973Z). === 2025-08-15 === * 10:15 "mohammed_abukhadra" was rejected (pending since 2025-05-16T10:14:48.403Z). === 2025-08-11 === * 11:48 "hmmyesbro" was rejected (pending since 2025-05-12T11:45:24.350Z). === 2025-08-10 === * 13:15 [[gitlab:dactyl|@dactyl]] was approved. === 2025-08-09 === * 04:39 "xxxx100000" was rejected (pending since 2025-05-10T04:37:44.949Z). === 2025-08-08 === * 14:33 [[gitlab:josefanthony|@josefanthony]] was approved. === 2025-08-07 === * 23:42 [[gitlab:robins7|@robins7]] was approved. * 21:42 [[gitlab:pols12|@pols12]] was approved. * 17:15 "sbronson" was rejected (pending since 2025-05-08T17:15:08.834Z). * 14:57 [[gitlab:alvindulle|@alvindulle]] was approved. * 14:45 [[gitlab:xentos|@xentos]] was approved. * 06:27 "jamesboste" was rejected (pending since 2025-05-08T06:25:14.793Z). * 03:57 "ysun" was rejected (pending since 2025-05-08T03:55:07.348Z). === 2025-08-06 === * 21:51 "pols12" was rejected (pending since 2025-05-07T21:49:13.598Z). * 01:51 "okeamah" was rejected (pending since 2025-05-07T01:48:50.114Z). === 2025-08-05 === * 09:15 "mobashir-2013" was rejected (pending since 2025-05-06T09:14:24.069Z). === 2025-08-01 === * 08:00 "douginamug" was rejected (pending since 2025-05-02T07:57:38.317Z). === 2025-07-31 === * 02:30 [[gitlab:ads|@ads]] was approved. === 2025-07-27 === * 13:15 "mrico2703" was rejected (pending since 2025-04-27T13:13:12.346Z). * 10:17 [[gitlab:josephfrancis12|@josephfrancis12]] was approved. * 10:17 [[gitlab:fuzzew|@fuzzew]] was approved. * 05:57 [[gitlab:biscuitbobby|@biscuitbobby]] was approved. * 05:48 [[gitlab:ecoholic|@ecoholic]] was approved. === 2025-07-26 === * 11:48 [[gitlab:chimnayyyy|@chimnayyyy]] was approved. * 11:48 [[gitlab:alwinalbert|@alwinalbert]] was approved. * 11:48 [[gitlab:hridyakk|@hridyakk]] was approved. * 11:45 [[gitlab:gaurigupta21|@gaurigupta21]] was approved. * 11:45 [[gitlab:binetaa|@binetaa]] was approved. * 10:21 [[gitlab:jyothikat22|@jyothikat22]] was approved. * 10:21 [[gitlab:zobotrombie|@zobotrombie]] was approved. * 10:21 [[gitlab:flykrth|@flykrth]] was approved. * 10:21 [[gitlab:mehrinshamim|@mehrinshamim]] was approved. * 10:21 [[gitlab:aadhi13|@aadhi13]] was approved. * 10:21 [[gitlab:malavikam05|@malavikam05]] was approved. * 10:18 [[gitlab:nf609|@nf609]] was approved. * 05:48 [[gitlab:nazalnihad|@nazalnihad]] was approved. * 05:48 [[gitlab:naveen28204280|@naveen28204280]] was approved. === 2025-07-25 === * 09:49 [[gitlab:kasyap9|@kasyap9]] was approved. * 09:30 [[gitlab:swayamagrahari|@swayamagrahari]] was approved. === 2025-07-24 === * 19:36 [[gitlab:madutgn|@madutgn]] was approved. === 2025-07-23 === * 20:09 [[gitlab:somerandomdeveloper|@somerandomdeveloper]] was approved. === 2025-07-22 === * 00:15 [[gitlab:iagoqnsi|@iagoqnsi]] was approved. === 2025-07-21 === * 17:30 [[gitlab:asadiqui|@asadiqui]] was approved. * 16:39 [[gitlab:tryvix1509|@tryvix1509]] was approved. * 04:27 [[gitlab:damian|@damian]] was approved. === 2025-07-20 === * 09:42 "mike-khoroshun" was rejected (pending since 2025-04-20T09:42:22.732Z). === 2025-07-17 === * 17:57 [[gitlab:haroldkrabs|@haroldkrabs]] was approved. * 13:45 [[gitlab:envlh|@envlh]] was approved. === 2025-07-14 === * 10:24 [[gitlab:missguru|@missguru]] was approved. * 00:57 "clarfonthey" was rejected (pending since 2025-04-14T00:56:32.626Z). === 2025-07-13 === * 01:01 [[gitlab:l235|@l235]] was approved. === 2025-07-11 === * 03:06 "rodavlas" was rejected (pending since 2025-04-11T03:05:45.590Z). === 2025-07-06 === * 00:09 "lakasa" was rejected (pending since 2025-04-06T00:06:28.469Z). === 2025-07-05 === * 21:54 "ctrlzvi" was rejected (pending since 2025-04-05T21:54:12.542Z). * 14:30 "aminualiyu" was rejected (pending since 2025-04-05T14:27:22.617Z). === 2025-07-04 === * 03:15 [[gitlab:galstar|@galstar]] was approved. === 2025-07-02 === * 11:27 "vicolas11" was rejected (pending since 2025-04-02T11:25:12.682Z). === 2025-06-29 === * 23:12 "naomi723" was rejected (pending since 2025-03-30T23:09:24.630Z). === 2025-06-28 === * 16:21 "mudeh2372" was rejected (pending since 2025-03-29T16:18:27.057Z). === 2025-06-27 === * 23:18 "rony143" was rejected (pending since 2025-03-28T23:16:13.671Z). * 22:21 [[gitlab:rluts|@rluts]] was approved. === 2025-06-26 === * 13:54 "creativegurus" was rejected (pending since 2025-03-27T13:52:41.706Z). === 2025-06-24 === * 17:42 [[gitlab:devjadiya|@devjadiya]] was approved. * 14:00 "dominic-r" was rejected (pending since 2025-03-25T14:00:07.307Z). === 2025-06-21 === * 00:48 [[gitlab:vriaa|@vriaa]] was approved. === 2025-06-18 === * 15:21 "ayushkhati1" was rejected (pending since 2025-03-19T15:18:50.062Z). === 2025-06-17 === * 20:45 "chiomavero" was rejected (pending since 2025-03-18T20:44:13.967Z). * 00:27 [[gitlab:eggroll97|@eggroll97]] was approved. === 2025-06-14 === * 20:57 "volvox" was rejected (pending since 2025-03-15T20:56:34.018Z). === 2025-06-13 === * 16:09 [[gitlab:supergrey|@supergrey]] was approved. * 11:03 "chqaz" was rejected (pending since 2025-03-14T11:01:09.600Z). * 10:24 [[gitlab:slong-wmf|@slong-wmf]] was approved. * 10:15 "hearvox" was rejected (pending since 2025-03-14T10:13:13.112Z). === 2025-06-12 === * 15:18 "jlam" was rejected (pending since 2025-03-13T15:17:54.099Z). === 2025-06-09 === * 20:48 "dipanjansengupta" was rejected (pending since 2025-03-10T20:48:03.545Z). * 19:27 [[gitlab:reggycelly|@reggycelly]] was approved. * 14:51 "arendpieter" was rejected (pending since 2025-03-10T14:51:01.445Z). * 13:21 [[gitlab:greenreaper|@greenreaper]] was approved. * 09:33 [[gitlab:mmta|@mmta]] was approved. * 08:03 "a-ssh22" was rejected (pending since 2025-03-10T08:03:08.111Z). === 2025-06-08 === * 21:06 "mm-episodenlistedlvaupdater" was rejected (pending since 2025-03-09T21:04:06.323Z). === 2025-06-06 === * 11:06 [[gitlab:olea|@olea]] was approved. === 2025-06-05 === * 20:33 [[gitlab:encodedwp|@encodedwp]] was approved. * 15:00 [[gitlab:toluayo|@toluayo]] was approved. * 13:51 [[gitlab:arnold_lup|@arnold_lup]] was approved. * 11:54 "sdhehua" was rejected (pending since 2025-03-06T11:51:48.241Z). === 2025-06-03 === * 21:27 [[gitlab:wewakey|@wewakey]] was approved. * 12:36 "hunsimon2" was rejected (pending since 2025-03-04T12:34:56.520Z). * 11:54 "hunsimon" was rejected (pending since 2025-03-04T11:53:54.652Z). === 2025-06-02 === * 12:01 [[gitlab:jaimedes|@jaimedes]] was approved. === 2025-05-30 === * 18:00 "sathvik9105" was rejected (pending since 2025-02-28T17:59:42.867Z). * 11:21 [[gitlab:tonythomas01|@tonythomas01]] was approved. * 10:06 [[gitlab:gpsleo|@gpsleo]] was approved. === 2025-05-29 === * 22:12 [[gitlab:codynguyen1116|@codynguyen1116]] was approved. === 2025-05-28 === * 02:57 [[gitlab:saper|@saper]] was approved. === 2025-05-27 === * 21:06 [[gitlab:mohammed_qays|@mohammed_qays]] was approved. * 15:33 "satanluimm" was rejected (pending since 2025-02-25T15:32:48.101Z). === 2025-05-26 === * 23:57 "seyedali220" was rejected (pending since 2025-02-24T23:56:17.621Z). === 2025-05-21 === * 11:12 [[gitlab:guilherme|@guilherme]] was approved. === 2025-05-19 === * 13:24 [[gitlab:emojiwiki|@emojiwiki]] was approved. === 2025-05-18 === * 00:00 "xidme" was rejected (pending since 2025-02-15T23:58:56.796Z). === 2025-05-17 === * 02:39 "kdh8219" was rejected (pending since 2025-02-15T02:36:32.237Z). === 2025-05-16 === * 15:09 [[gitlab:maxbinderwmf|@maxbinderwmf]] was approved. === 2025-05-15 === * 04:30 "inspectorzer0" was rejected (pending since 2025-02-13T04:27:33.179Z). === 2025-05-14 === * 17:42 [[gitlab:llugo|@llugo]] was approved. === 2025-05-13 === * 20:18 "mmta" was rejected (pending since 2025-02-11T20:17:23.407Z). === 2025-05-11 === * 20:51 "jad" was rejected (pending since 2025-02-09T20:49:07.333Z). * 17:54 "nishchalsundan" was rejected (pending since 2025-02-09T17:52:25.761Z). * 16:39 "mohammed_abukhadra" was rejected (pending since 2025-02-09T16:39:03.730Z). === 2025-05-09 === * 09:12 [[gitlab:sirchanmp|@sirchanmp]] was approved. === 2025-05-08 === * 08:18 [[gitlab:mengeditch|@mengeditch]] was approved. === 2025-05-07 === * 03:45 "xluffy" was rejected (pending since 2025-02-05T03:45:14.181Z). === 2025-05-06 === * 16:54 "punhaniabhishek" was rejected (pending since 2025-02-04T16:53:50.758Z). * 09:36 [[gitlab:bmartinezcalvo|@bmartinezcalvo]] was approved. === 2025-05-02 === * 12:24 [[gitlab:tohaomg|@tohaomg]] was approved. * 11:48 [[gitlab:mavrikant|@mavrikant]] was approved. * 11:45 [[gitlab:daanvr|@daanvr]] was approved. === 2025-05-01 === * 09:09 "mjoerg" was rejected (pending since 2025-01-30T09:09:04.204Z). === 2025-04-30 === * 23:06 "sanskardubey" was rejected (pending since 2025-01-29T23:03:25.489Z). === 2025-04-29 === * 16:00 "geyslein" was rejected (pending since 2025-01-28T16:00:01.510Z). === 2025-04-26 === * 09:30 "anjali9027" was rejected (pending since 2025-01-25T09:28:07.064Z). === 2025-04-25 === * 18:00 "salahhazaa" was rejected (pending since 2025-01-24T17:58:30.030Z). * 15:15 [[gitlab:yiming|@yiming]] was approved. * 02:06 "mrchanmp" was rejected (pending since 2025-01-24T02:03:58.308Z). === 2025-04-23 === * 17:03 "rj2904" was rejected (pending since 2025-01-22T17:03:11.207Z). * 14:21 "nischay33" was rejected (pending since 2025-01-22T14:19:21.081Z). === 2025-04-22 === * 19:27 "dj80" was rejected (pending since 2025-01-21T19:25:28.498Z). * 14:30 [[gitlab:kaimamin|@kaimamin]] was approved. * 09:57 "debo" was rejected (pending since 2025-01-21T09:54:47.955Z). === 2025-04-21 === * 12:24 "unshell" was rejected (pending since 2025-01-20T12:21:59.686Z). === 2025-04-18 === * 15:06 [[gitlab:spartanarbinger|@spartanarbinger]] was approved. === 2025-04-16 === * 03:09 "dewey" was rejected (pending since 2025-01-15T03:06:17.488Z). === 2025-04-15 === * 19:45 "emdadul" was rejected (pending since 2025-01-14T19:42:29.285Z). === 2025-04-14 === * 06:45 [[gitlab:bcampbell804|@bcampbell804]] was approved. === 2025-04-11 === * 06:27 [[gitlab:jvanderhoop|@jvanderhoop]] was approved. === 2025-04-10 === * 04:12 "bhai420" was rejected (pending since 2025-01-09T04:10:29.430Z). === 2025-04-09 === * 05:03 "austinvarshney" was rejected (pending since 2025-01-08T05:02:34.175Z). === 2025-04-06 === * 15:36 [[gitlab:elph|@elph]] was approved. === 2025-04-02 === * 10:33 [[gitlab:ozge|@ozge]] was approved. === 2025-03-31 === * 20:15 "demandkey" was rejected (pending since 2024-12-30T20:14:23.096Z). * 15:18 [[gitlab:danyya|@danyya]] was approved. === 2025-03-28 === * 15:54 [[gitlab:rutsavi09|@rutsavi09]] was approved. * 15:54 [[gitlab:ilanen1|@ilanen1]] was approved. === 2025-03-25 === * 19:27 [[gitlab:irfo|@irfo]] was approved. * 11:54 [[gitlab:kmontalva-wmf|@kmontalva-wmf]] was approved. * 04:33 [[gitlab:paul26|@paul26]] was approved. * 04:18 "as1100k" was rejected (pending since 2024-12-24T04:18:06.813Z). === 2025-03-24 === * 11:33 "amzadkhankk" was rejected (pending since 2024-12-23T11:33:14.176Z). === 2025-03-23 === * 12:24 "wolfdo" was rejected (pending since 2024-12-22T12:23:35.056Z). === 2025-03-22 === * 09:45 [[gitlab:fjmustak|@fjmustak]] was approved. === 2025-03-20 === * 18:42 "sathishkokila" was rejected (pending since 2024-12-19T18:39:35.161Z). * 17:03 [[gitlab:alien4444|@alien4444]] was approved. * 15:27 [[gitlab:davidcoronel|@davidcoronel]] was approved. === 2025-03-19 === * 22:57 [[gitlab:r1f4t|@r1f4t]] was approved. * 19:03 "daniel24ps" was rejected (pending since 2024-12-18T19:00:21.249Z). * 14:18 [[gitlab:beepbooppenguin|@beepbooppenguin]] was approved. === 2025-03-18 === * 17:48 "rahulkundu1209" was rejected (pending since 2024-12-17T17:46:41.936Z). * 08:15 "kirtisikka972" was rejected (pending since 2024-12-17T08:13:25.487Z). === 2025-03-15 === * 13:30 "tulspal_sidhu" was rejected (pending since 2024-12-14T13:29:10.606Z). * 01:39 "peacedeadc" was rejected (pending since 2024-12-14T01:37:36.579Z). === 2025-03-14 === * 03:51 [[gitlab:chuckthebuck|@chuckthebuck]] was approved. * 02:33 "yxngtrtxll" was rejected (pending since 2024-12-13T02:31:51.658Z). === 2025-03-13 === * 14:36 [[gitlab:iccander|@iccander]] was approved. === 2025-03-12 === * 23:21 "jokerchic36" was rejected (pending since 2024-12-11T23:21:00.670Z). * 15:30 [[gitlab:naomi|@naomi]] was approved. * 15:27 [[gitlab:cobi|@cobi]] was approved. === 2025-03-11 === * 12:42 "mohitvermaxx" was rejected (pending since 2024-12-10T12:40:56.967Z). === 2025-03-10 === * 16:51 [[gitlab:nanona15dobato|@nanona15dobato]] was approved. === 2025-03-09 === * 22:39 [[gitlab:jonkolbert|@jonkolbert]] was approved. * 20:45 [[gitlab:urbanecmtest2|@urbanecmtest2]] was approved. === 2025-03-07 === * 16:54 [[gitlab:hswan|@hswan]] was approved. * 14:42 [[gitlab:atitkov|@atitkov]] was approved. * 00:42 [[gitlab:infrastruktur|@infrastruktur]] was approved. === 2025-03-06 === * 17:21 "johnmann" was rejected (pending since 2024-12-05T17:19:24.995Z). === 2025-03-05 === * 07:33 [[gitlab:monx9494|@monx9494]] was approved. === 2025-03-02 === * 21:21 "paul26" was rejected (pending since 2024-12-01T21:20:19.681Z). === 2025-03-01 === * 19:15 [[gitlab:izno|@izno]] was approved. * 12:45 [[gitlab:nyerho|@nyerho]] was approved. === 2025-02-28 === * 18:27 [[gitlab:chuckonwumelu|@chuckonwumelu]] was approved. * 13:09 "ashwinpraveengo" was rejected (pending since 2024-11-29T13:07:47.240Z). * 00:18 "eduardoaugusto" was rejected (pending since 2024-11-29T00:17:43.372Z). === 2025-02-27 === * 20:39 "volkanurl" was rejected (pending since 2024-11-28T20:37:18.101Z). === 2025-02-24 === * 21:15 [[gitlab:feeglgeef|@feeglgeef]] was approved. * 20:18 [[gitlab:piaanalysis2|@piaanalysis2]] was approved. * 19:06 [[gitlab:dhardy|@dhardy]] was approved. === 2025-02-22 === * 19:27 [[gitlab:owuh|@owuh]] was approved. === 2025-02-19 === * 16:06 [[gitlab:artemkloko|@artemkloko]] was approved. * 13:03 [[gitlab:jgafnea|@jgafnea]] was approved. === 2025-02-17 === * 16:33 [[gitlab:asmartkitten|@asmartkitten]] was approved. === 2025-02-16 === * 19:12 "gaurigupta21" was rejected (pending since 2024-11-17T19:11:07.416Z). === 2025-02-15 === * 01:18 [[gitlab:mediawiki-quickstart-ci|@mediawiki-quickstart-ci]] was approved. === 2025-02-14 === * 15:21 "nathanbnm" was rejected (pending since 2024-11-15T15:18:19.632Z). === 2025-02-13 === * 16:45 [[gitlab:priyanshuchahal|@priyanshuchahal]] was approved. * 16:42 [[gitlab:ajhalili2006|@ajhalili2006]] was approved. === 2025-02-12 === * 23:21 "monkeypatch999" was rejected (pending since 2024-11-13T23:20:38.398Z). * 06:36 [[gitlab:jainlakshita28|@jainlakshita28]] was approved. === 2025-02-11 === * 19:27 [[gitlab:matthewsm2|@matthewsm2]] was approved. === 2025-02-09 === * 16:15 "mohammed_abukhadra" was rejected (pending since 2024-11-10T16:15:18.361Z). === 2025-02-07 === * 21:33 "brennan" was rejected (pending since 2024-11-08T21:31:07.351Z). === 2025-02-06 === * 08:24 "mmta" was rejected (pending since 2024-11-07T08:22:36.724Z). * 06:21 [[gitlab:bunnypranav|@bunnypranav]] was approved. === 2025-02-05 === * 22:39 "chrissteinchen" was rejected (pending since 2024-11-06T22:38:16.673Z). === 2025-02-03 === * 07:45 "edriiic" was rejected (pending since 2024-11-04T07:44:46.849Z). * 01:12 "geppy" was rejected (pending since 2024-11-04T01:10:48.710Z). === 2025-02-02 === * 13:18 "funa-enpitu" was rejected (pending since 2024-11-03T13:15:46.065Z). === 2025-01-31 === * 23:42 "nfontes" was rejected (pending since 2024-11-01T23:39:41.755Z). * 22:51 "sbronson" was rejected (pending since 2024-11-01T22:50:31.871Z). * 00:42 [[gitlab:farid|@farid]] was approved. === 2025-01-27 === * 08:15 [[gitlab:eliza189|@eliza189]] was approved. === 2025-01-25 === * 09:51 [[gitlab:pamputt|@pamputt]] was approved. === 2025-01-23 === * 14:30 [[gitlab:lubianat|@lubianat]] was approved. * 11:45 [[gitlab:bootsa|@bootsa]] was approved. === 2025-01-21 === * 05:09 "niko" was rejected (pending since 2024-07-21T16:10:01.377Z). * 05:09 "thawizkid369777" was rejected (pending since 2024-07-18T17:42:44.493Z). * 05:09 "sarthaksingh2" was rejected (pending since 2024-07-10T11:31:30.470Z). * 05:09 "shriyakt" was rejected (pending since 2024-07-06T04:54:10.248Z). * 05:09 "akshaya" was rejected (pending since 2024-07-06T04:04:51.488Z). * 05:09 "alaka03aj" was rejected (pending since 2024-07-05T18:01:54.876Z). * 05:09 "sulochanaviji-5049" was rejected (pending since 2024-07-01T05:58:00.427Z). * 05:09 "nayanjnath" was rejected (pending since 2024-07-01T02:51:57.405Z). * 05:09 "sd44" was rejected (pending since 2024-06-30T04:28:51.436Z). * 05:09 "metavalent" was rejected (pending since 2024-06-29T01:37:14.210Z). * 05:09 "wicloudx" was rejected (pending since 2024-06-28T11:51:23.335Z). * 05:09 "debo" was rejected (pending since 2024-06-28T01:44:59.845Z). * 05:09 "bwiki" was rejected (pending since 2024-06-23T14:15:38.032Z). * 05:09 "toprak" was rejected (pending since 2024-06-23T11:35:50.819Z). * 05:09 "iristeller" was rejected (pending since 2024-06-14T20:53:48.959Z). * 05:09 "jcolvin" was rejected (pending since 2024-06-12T17:29:01.238Z). * 05:09 "kalyan" was rejected (pending since 2024-06-07T07:52:46.993Z). * 05:09 "bluecrystal" was rejected (pending since 2024-06-06T19:16:20.107Z). * 05:09 "iftttrohit" was rejected (pending since 2024-06-04T12:08:50.818Z). * 05:09 "pogpotato" was rejected (pending since 2024-06-03T17:58:21.684Z). * 05:09 "cptlausebaer" was rejected (pending since 2024-05-31T18:53:27.692Z). * 05:09 "hdevine825" was rejected (pending since 2024-05-31T17:04:18.279Z). * 05:09 "anaghaa18" was rejected (pending since 2024-05-25T19:14:31.803Z). * 05:09 "atharvanair04" was rejected (pending since 2024-05-25T14:24:52.825Z). * 05:09 "anasvemmully" was rejected (pending since 2024-05-25T06:10:27.261Z). * 05:09 "abhinavmohandas" was rejected (pending since 2024-05-25T06:05:24.825Z). * 05:09 "kksurendran06" was rejected (pending since 2024-05-25T06:04:38.082Z). * 05:09 "albertmarshall8896" was rejected (pending since 2024-05-23T09:32:05.462Z). * 05:09 "akellison" was rejected (pending since 2024-05-17T02:07:24.229Z). * 05:09 "mainowill" was rejected (pending since 2024-04-16T23:30:33.881Z). * 05:09 "bzhqc" was rejected (pending since 2024-04-16T19:50:38.676Z). * 05:09 "safan41" was rejected (pending since 2024-04-16T03:34:48.942Z). * 05:09 "mgagat" was rejected (pending since 2024-04-16T03:21:51.764Z). * 05:09 "okeamah" was rejected (pending since 2024-04-16T02:49:00.143Z). * 05:09 "xuhao61" was rejected (pending since 2024-04-15T23:45:09.083Z). * 04:47 "cybel" was rejected (pending since 2024-04-15T06:46:35.791Z). === 2025-01-20 === * 14:33 [[gitlab:your1|@your1]] was approved. === 2025-01-18 === * 10:09 [[gitlab:galrach600|@galrach600]] was approved. * 02:51 [[gitlab:blankeclair|@blankeclair]] was approved. === 2025-01-17 === * 13:57 [[gitlab:dsantamaria|@dsantamaria]] was approved. === 2025-01-15 === * 17:12 [[gitlab:smartse|@smartse]] was approved. === 2025-01-14 === * 17:03 [[gitlab:naorleizer|@naorleizer]] was approved. === 2025-01-13 === * 02:45 [[gitlab:wolf20482|@wolf20482]] was approved. === 2025-01-12 === * 17:45 [[gitlab:tamzin|@tamzin]] was approved. === 2025-01-11 === * 15:24 [[gitlab:bargioni|@bargioni]] was approved. * 14:30 [[gitlab:salelya|@salelya]] was approved. * 10:15 [[gitlab:malakatshy|@malakatshy]] was approved. * 05:21 [[gitlab:newmcpee|@newmcpee]] was approved. === 2025-01-09 === * 15:30 [[gitlab:gkyziridis|@gkyziridis]] was approved. === 2025-01-08 === * 16:21 [[gitlab:ukrface|@ukrface]] was approved. === 2024-12-28 === * 03:27 [[gitlab:twonum|@twonum]] was approved. === 2024-12-25 === * 06:09 [[gitlab:harsv567|@harsv567]] was approved. === 2024-12-21 === * 11:24 [[gitlab:amutha2002|@amutha2002]] was approved. === 2024-12-20 === * 19:51 [[gitlab:hridyeshgupta|@hridyeshgupta]] was approved. * 10:00 [[gitlab:ro-shines|@ro-shines]] was approved. * 08:09 [[gitlab:kesharwaniarpita|@kesharwaniarpita]] was approved. === 2024-12-18 === * 14:45 [[gitlab:soylacarli|@soylacarli]] was approved. === 2024-12-16 === * 20:33 [[gitlab:aleyasiddika1|@aleyasiddika1]] was approved. === 2024-12-15 === * 07:33 [[gitlab:abhishek02bhardwaj|@abhishek02bhardwaj]] was approved. === 2024-12-13 === * 13:18 [[gitlab:ashmitabathre204|@ashmitabathre204]] was approved. === 2024-12-10 === * 06:39 [[gitlab:ginaan|@ginaan]] was approved. === 2024-12-09 === * 05:45 [[gitlab:kallinavya|@kallinavya]] was approved. * 00:54 [[gitlab:viserion-7|@viserion-7]] was approved. === 2024-12-08 === * 17:27 [[gitlab:wargo|@wargo]] was approved. === 2024-12-05 === * 11:15 [[gitlab:ranjithraj|@ranjithraj]] was approved. === 2024-12-02 === * 21:21 [[gitlab:a930913|@a930913]] was approved. === 2024-12-01 === * 02:39 [[gitlab:kingchristlike1|@kingchristlike1]] was approved. === 2024-11-21 === * 13:45 [[gitlab:sascha|@sascha]] was approved. === 2024-11-19 === * 16:36 [[gitlab:jly|@jly]] was approved. === 2024-11-15 === * 02:54 [[gitlab:danielyepezgarces|@danielyepezgarces]] was approved. === 2024-11-14 === * 14:15 [[gitlab:stimoroll|@stimoroll]] was approved. === 2024-11-09 === * 17:15 [[gitlab:f4udeveloper|@f4udeveloper]] was approved. === 2024-11-07 === * 19:15 [[gitlab:zulf|@zulf]] was approved. * 05:33 [[gitlab:hassanamin|@hassanamin]] was approved. === 2024-11-06 === * 19:39 [[gitlab:daniuu|@daniuu]] was approved. * 00:18 [[gitlab:rlopez-wmf|@rlopez-wmf]] was approved. === 2024-10-09 === * 14:45 [[gitlab:jtweed|@jtweed]] was approved. * 10:24 [[gitlab:ifrahkh|@ifrahkh]] was approved. * 09:06 [[gitlab:wikibayer|@wikibayer]] was approved. === 2024-10-06 === * 10:27 [[gitlab:keerthan16|@keerthan16]] was approved. === 2024-10-04 === * 07:45 [[gitlab:hakimi97|@hakimi97]] was approved. === 2024-09-30 === * 07:39 [[gitlab:ninjastrikers|@ninjastrikers]] was approved. === 2024-09-28 === * 17:30 [[gitlab:webrunner95|@webrunner95]] was approved. === 2024-09-18 === * 21:39 [[gitlab:elliottetzkorn|@elliottetzkorn]] was approved. === 2024-09-14 === * 22:06 [[gitlab:humptydumpty|@humptydumpty]] was approved. === 2024-09-06 === * 08:48 [[gitlab:mickabarber|@mickabarber]] was approved. === 2024-08-27 === * 17:36 [[gitlab:edgars|@edgars]] was approved. === 2024-08-22 === * 09:18 [[gitlab:antonkokhwmde|@antonkokhwmde]] was approved. === 2024-08-14 === * 19:21 [[gitlab:jfk|@jfk]] was approved. === 2024-08-13 === * 17:57 [[gitlab:daxserver|@daxserver]] was approved. === 2024-08-11 === * 09:57 [[gitlab:pauliesnug|@pauliesnug]] was approved. === 2024-08-10 === * 08:42 [[gitlab:ashig|@ashig]] was approved. === 2024-08-09 === * 14:09 [[gitlab:masssly|@masssly]] was approved. === 2024-08-05 === * 22:15 [[gitlab:mrtortue|@mrtortue]] was approved. === 2024-08-02 === * 16:21 [[gitlab:dsantini|@dsantini]] was approved. === 2024-07-31 === * 11:54 [[gitlab:cptviraj|@cptviraj]] was approved. === 2024-07-30 === * 19:09 [[gitlab:iniquity|@iniquity]] was approved. * 10:00 [[gitlab:collins|@collins]] was approved. === 2024-07-27 === * 15:57 [[gitlab:songnguxyz|@songnguxyz]] was approved. === 2024-07-25 === * 12:36 [[gitlab:mszabo|@mszabo]] was approved. * 09:21 [[gitlab:agarwalmahima|@agarwalmahima]] was approved. === 2024-07-24 === * 08:05 [[gitlab:dragoniez|@dragoniez]] was approved. === 2024-07-23 === * 06:54 [[gitlab:mirji|@mirji]] was approved. === 2024-07-16 === * 10:00 [[gitlab:lakejason0|@lakejason0]] was approved. === 2024-07-12 === * 11:33 [[gitlab:cn|@cn]] was approved. * 08:12 [[gitlab:unchampignon|@unchampignon]] was approved. === 2024-07-07 === * 17:12 [[gitlab:agamyasamuel|@agamyasamuel]] was approved. * 05:24 [[gitlab:kuldeepburjbhalaike|@kuldeepburjbhalaike]] was approved. === 2024-07-06 === * 11:18 [[gitlab:dibya|@dibya]] was approved. * 04:54 [[gitlab:sarthakparashar|@sarthakparashar]] was approved. === 2024-07-05 === * 18:15 [[gitlab:vanshikarathi|@vanshikarathi]] was approved. === 2024-07-02 === * 19:00 [[gitlab:ebrahim|@ebrahim]] was approved. === 2024-07-01 === * 20:12 [[gitlab:rockingpenny4|@rockingpenny4]] was approved. * 18:15 [[gitlab:balajijagadesh|@balajijagadesh]] was approved. === 2024-06-30 === * 18:24 [[gitlab:hrideshmg|@hrideshmg]] was approved. * 07:18 [[gitlab:chanakyakumardas|@chanakyakumardas]] was approved. * 06:30 [[gitlab:rihaan180|@rihaan180]] was approved. === 2024-06-27 === * 17:36 [[gitlab:driedmueller|@driedmueller]] was approved. === 2024-06-19 === * 12:57 [[gitlab:audreypenven|@audreypenven]] was approved. === 2024-06-16 === * 01:18 [[gitlab:roysmith|@roysmith]] was approved. === 2024-06-08 === * 02:45 [[gitlab:jleedev|@jleedev]] was approved. === 2024-06-03 === * 13:57 [[gitlab:afeder|@afeder]] was approved. === 2024-06-01 === * 10:54 [[gitlab:florianschmitt|@florianschmitt]] was approved. === 2024-05-30 === * 16:42 [[gitlab:krlsca|@krlsca]] was approved. === 2024-05-28 === * 11:24 [[gitlab:rickijay|@rickijay]] was approved. === 2024-05-26 === * 11:18 [[gitlab:ranjithsiji|@ranjithsiji]] was approved. === 2024-05-25 === * 07:24 [[gitlab:jony|@jony]] was approved. === 2024-05-23 === * 08:45 [[gitlab:lepticed7|@lepticed7]] was approved. === 2024-05-22 === * 20:42 [[gitlab:echecs|@echecs]] was approved. === 2024-05-21 === * 13:33 [[gitlab:mbs|@mbs]] was approved. === 2024-05-19 === * 18:06 [[gitlab:ionenlaser|@ionenlaser]] was approved. === 2024-05-18 === * 23:36 [[gitlab:mdaniels5757|@mdaniels5757]] was approved. === 2024-05-17 === * 08:54 [[gitlab:grapedog|@grapedog]] was approved. === 2024-05-08 === * 19:42 [[gitlab:kelhurd|@kelhurd]] was approved. * 19:06 [[gitlab:khurd|@khurd]] was approved. === 2024-05-06 === * 19:48 [[gitlab:j3j5|@j3j5]] was approved. * 12:06 [[gitlab:tk-999|@tk-999]] was approved. === 2024-05-05 === * 22:09 [[gitlab:pppery|@pppery]] was approved. * 20:33 [[gitlab:sakretsu|@sakretsu]] was approved. * 12:12 [[gitlab:waterquark|@waterquark]] was approved. === 2024-05-04 === * 09:03 [[gitlab:multichill|@multichill]] was approved. * 07:42 [[gitlab:abaris|@abaris]] was approved. === 2024-05-03 === * 14:57 [[gitlab:maurusian|@maurusian]] was approved. === 2024-04-24 === * 05:48 [[gitlab:wolfinux|@wolfinux]] was approved. === 2024-04-23 === * 15:48 [[gitlab:dreamrimmer|@dreamrimmer]] was approved. === 2024-04-21 === * 06:51 [[gitlab:alon|@alon]] was approved. === 2024-04-17 === * 23:33 [[gitlab:derenrich|@derenrich]] was approved. === 2024-04-16 === * 17:18 [[gitlab:valcio|@valcio]] was approved. === 2024-04-14 === * 16:51 [[gitlab:wikilucas00|@wikilucas00]] was approved. === 2024-04-06 === * 12:48 [[gitlab:theprotonade|@theprotonade]] was approved. === 2024-04-02 === * 07:30 [[gitlab:bohuizhang|@bohuizhang]] was approved. === 2024-03-30 === * 13:36 [[gitlab:lpintscher|@lpintscher]] was approved. === 2024-03-26 === * 17:09 [[gitlab:eenabulele|@eenabulele]] was approved. === 2024-03-25 === * 14:27 [[gitlab:tuukka|@tuukka]] was approved. === 2024-03-24 === * 12:24 [[gitlab:firefly|@firefly]] was approved. === 2024-03-21 === * 19:33 [[gitlab:universal-omega|@universal-omega]] was approved. === 2024-03-17 === * 10:36 [[gitlab:bisel91|@bisel91]] was approved. === 2024-03-16 === * 10:09 [[gitlab:delord|@delord]] was approved. * 00:42 [[gitlab:athulvis1|@athulvis1]] was approved. === 2024-03-15 === * 19:06 [[gitlab:ignaciorodrguez|@ignaciorodrguez]] was approved. * 08:30 [[gitlab:peachey88|@peachey88]] was approved. * 06:51 [[gitlab:derick|@derick]] was approved. === 2024-03-12 === * 15:06 [[gitlab:xiaoxiao|@xiaoxiao]] was approved. === 2024-03-06 === * 13:21 [[gitlab:desianabae1|@desianabae1]] was approved. === 2024-03-05 === * 19:21 [[gitlab:ep1c|@ep1c]] was approved. * 16:33 [[gitlab:jasmine|@jasmine]] was approved. === 2024-03-02 === * 06:42 [[gitlab:potsdamlamb|@potsdamlamb]] was approved. === 2024-02-29 === * 23:18 [[gitlab:arandomname123|@arandomname123]] was approved. * 18:03 [[gitlab:baba|@baba]] was approved. * 17:48 [[gitlab:yfdyh000|@yfdyh000]] was approved. * 03:09 [[gitlab:sds|@sds]] was approved. === 2024-02-27 === * 23:33 [[gitlab:lofhi|@lofhi]] was approved. === 2024-02-15 === * 19:45 [[gitlab:gergesshamon|@gergesshamon]] was approved. === 2024-02-14 === * 14:33 [[gitlab:philipnelson99|@philipnelson99]] was approved. === 2024-02-13 === * 13:06 [[gitlab:dringsim|@dringsim]] was approved. === 2024-02-12 === * 17:36 [[gitlab:haak|@haak]] was approved. === 2024-02-05 === * 17:33 [[gitlab:qwerfjkl|@qwerfjkl]] was approved. * 17:14 [[gitlab:ahecht|@ahecht]] was approved. === 2024-02-01 === * 09:27 [[gitlab:arinaigum|@arinaigum]] was approved. * 00:15 [[gitlab:jas42|@jas42]] was approved. * 00:15 [[gitlab:edhu|@edhu]] was approved. * 00:15 [[gitlab:marnanel|@marnanel]] was approved. * 00:15 [[gitlab:ibrahemqasim|@ibrahemqasim]] was approved. * 00:15 [[gitlab:amasotti|@amasotti]] was approved. * 00:15 [[gitlab:deni|@deni]] was approved. * 00:15 [[gitlab:cyber|@cyber]] was approved. * 00:15 [[gitlab:saroj|@saroj]] was approved. === 2024-01-29 === * 21:42 [[gitlab:rgupta|@rgupta]] was approved. === 2024-01-07 === * 09:48 [[gitlab:lutrome|@lutrome]] was approved. === 2024-01-05 === * 20:48 [[gitlab:jinoytommanjaly|@jinoytommanjaly]] was approved. * 02:51 [[gitlab:braunobruno|@braunobruno]] was approved. * 01:08 [[gitlab:amorymeltzer|@amorymeltzer]] was approved. * 01:08 [[gitlab:phi22ipus|@phi22ipus]] was approved. === 2024-01-03 === * 14:45 [[gitlab:gabina|@gabina]] was approved. === 2024-01-02 === * 13:18 [[gitlab:arthurtaylor|@arthurtaylor]] was approved. === 2023-12-23 === * 00:33 [[gitlab:aram|@aram]] was approved. === 2023-12-22 === * 16:24 [[gitlab:elpitareio|@elpitareio]] was approved. === 2023-12-21 === * 00:43 [[gitlab:bsadowski1|@bsadowski1]] was approved. * 00:43 [[gitlab:ederporto|@ederporto]] was approved. * 00:43 [[gitlab:sadraiiali|@sadraiiali]] was approved. * 00:43 [[gitlab:wasp-outis|@wasp-outis]] was approved. * 00:43 [[gitlab:bodhisattwa|@bodhisattwa]] was approved. * 00:43 [[gitlab:air7538|@air7538]] was approved. * 00:43 [[gitlab:anzx|@anzx]] was approved. * 00:43 [[gitlab:tekask1903|@tekask1903]] was approved. * 00:42 [[gitlab:kiwi-0x010c|@kiwi-0x010c]] was approved. * 00:42 [[gitlab:mpaa|@mpaa]] was approved. * 00:42 [[gitlab:kutay|@kutay]] was approved. * 00:42 [[gitlab:wattmto|@wattmto]] was approved. so5avzgrq0yzfa5ylg0cn45dewgwzan User:FFurnari-WMF/HaproxyAWSLC 2 459052 2398845 2396301 2026-04-04T04:54:40Z Quiddity 1884 lang=text 2398845 wikitext text/x-wiki == Performance comparison between HAProxy 3.2 + OpenSSL 3.5 vs HAProxy 3.2 + AWS-LC (Debian Trixie) == === Host setup === * Using cp2041 and cp2042 to host, respectively, HAProxy 3.2 + OpenSSL 3.5 and HAProxy 3.2 + AWS-LC. These hosts are depooled and [[phab:T419753|waiting to be decommissioned]]. * On both hosts: ** Puppet has been disabled ** HaproxyKafka service has been stopped and disabled to avoid polluting metrics ** haproxy-mtail service has been stopped and disabled to avoid polluting metrics ** Installed (manually) python3.13-venv to run benchmark script ** Fetched [https://gitlab.wikimedia.org/fabfur/benchmark-scripts/-/tree/main/http/benchmark-curl?ref_type=heads benchmark script from GitLab repo] ** Created a Python virtualenv and installed requirements for the script above * On cp2041: ** Before disabling puppet, installed HAProxy 3.2 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1254195) * On cp2042: ** Installed haproxy-awslc ([https://www.haproxy.com/downloads official HAProxy performance packages]), <code>haproxy-awslc</code>, <code>libssl-awslc</code> ** Changed HAProxy configuration to adapt to AWS-LC:<syntaxhighlight lang="diff"> @@ -26,7 +26,7 @@ ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.3 ssl-default-bind-ciphers -ALL:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384 - #ssl-dh-param-file /etc/ssl/dhparam.pem + ssl-dh-param-file /etc/ssl/dhparam.pem tune.ssl.cachesize 512000 tune.ssl.lifetime 86400 maxconn 200000 </syntaxhighlight> ** Set '''/etc/default/haproxy''' back to using EXTRAOPTS (overridden by haproxy-awslc package?): <code>EXTRAOPTS="-f /etc/haproxy/conf.d"</code> ** Created '''/etc/systemd/system/haproxy.service''' (copied from other hosts' '''/lib/systemd/system/haproxy.service''') * === Run the benchmark script === * On both hosts: remember to unset the envvars: <code>HTTP_PROXY</code>, <code>HTTPS_PROXY</code>, <code>http_proxy</code>, <code>https_proxy</code>, <code>NO_PROXY</code>, <code>no_proxy</code> Example benchmark script run for text and upload (with output): * cp2041 (HAProxy 3.2 + OpenSSL 3.5) <code>benchmark-curl.py --concurrency 100 --target https://en.wikipedia.org/ --resolve en.wikipedia.org:443:127.0.0.1 --num-requests 10000 --http-version 2 --header 'nonexistant'</code><syntaxhighlight lang="text"> +---------------------------+--------------+---------------+--------------+----------+------------------+ | Target | Requests # | Concurrency | Successful | Failed | Total time (s) | +===========================+==============+===============+==============+==========+==================+ | https://en.wikipedia.org/ | 10000 | 100 | 10000 | 0 | 2.42375 | +---------------------------+--------------+---------------+--------------+----------+------------------+ +---------------+-------+-------+-------+-------+---------+----------+-------+ | description | p50 | p75 | p95 | p99 | p99.9 | p99.99 | max | +===============+=======+=======+=======+=======+=========+==========+=======+ | TTFB | 0.22 | 0.26 | 0.37 | 8.83 | 39.53 | 41.40 | 41.47 | +---------------+-------+-------+-------+-------+---------+----------+-------+ | TLS handshake | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 20.26 | +---------------+-------+-------+-------+-------+---------+----------+-------+ | TCP handshake | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.64 | +---------------+-------+-------+-------+-------+---------+----------+-------+ +---------------+---------+-------------+ | Status code | Count | Share (%) | +===============+=========+=============+ | 400 | 10000 | 100 | +---------------+---------+-------------+ </syntaxhighlight> * cp2041 (HAProxy 3.2 + AWS-LC) <code>benchmark-curl.py --concurrency 100 --target <nowiki>https://upload.wikimedia.org/nonexistent</nowiki> --resolve upload.wikimedia.org:443:127.0.0.1 --num-requests 10000 --http-version 2 --header 'nonexistant'</code><syntaxhighlight lang="text"> +------------------------------------------+--------------+---------------+--------------+----------+------------------+ | Target | Requests # | Concurrency | Successful | Failed | Total time (s) | +==========================================+==============+===============+==============+==========+==================+ | https://upload.wikimedia.org/nonexistent | 10000 | 100 | 10000 | 0 | 2.0205 | +------------------------------------------+--------------+---------------+--------------+----------+------------------+ +---------------+-------+-------+-------+-------+---------+----------+-------+ | description | p50 | p75 | p95 | p99 | p99.9 | p99.99 | max | +===============+=======+=======+=======+=======+=========+==========+=======+ | TTFB | 0.20 | 0.20 | 0.23 | 1.67 | 31.31 | 32.60 | 33.62 | +---------------+-------+-------+-------+-------+---------+----------+-------+ | TLS handshake | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 17.42 | +---------------+-------+-------+-------+-------+---------+----------+-------+ | TCP handshake | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 2.27 | +---------------+-------+-------+-------+-------+---------+----------+-------+ +---------------+---------+-------------+ | Status code | Count | Share (%) | +===============+=========+=============+ | 400 | 10000 | 100 | +---------------+---------+-------------+ </syntaxhighlight> {{collapse top|Old procedure}} === OLD === === Procedure [draft] === This procedure is meant to be a checklist to review before start testing the differences between OpenSSL 3.5, OpenSSL 1.1.1 (current baseline) and AWS-LC in terms of performance and eventual unexpected issues. The procedure focuses on using the different TLS libraries with HAProxy 3.2. ==== Hosts preparation ==== ===== Preliminary steps ===== * Depool and silence 2 hosts to use as server and client ** <code>cp7001.magru.wmnet</code> and <code>cp7009.magru.wmnet</code> are on the same rack and can be used for this task ** From now on the two hosts will be labeled as '''[client]''' and '''[server]''' ** Puppet agent must be disabled on '''[server]''' to avoid overriding local modifications. ===== Packages and libraries ===== Copy the packages and libraries on '''[server]''' (see below for package building) {{Note|text=Packages and binaries listed below are also present on '''apt1002.wikimedia.org:/home/fabfur/tls-tests/'''}} {{Note|text=Nomenclature for HAProxy package is '''<haproxy_deb_version>_identifier_<...>''' where identifier can be '''os111''' for packages built against OpenSSL 1.1.1, '''os35''' for packages built against OpenSSL 3.5 and '''awslc''' for packages built against AWS-LC libraries}} * OpenSSL 3.5 packages <pre>libssl3t64_3.5.1-1_amd64.deb libssl3t64-dbgsym_3.5.1-1_amd64.deb libssl-dev_3.5.1-1_amd64.deb openssl_3.5.1-1_amd64.deb openssl-dbgsym_3.5.1-1_amd64.deb openssl-provider-legacy_3.5.1-1_amd64.deb openssl-provider-legacy-dbgsym_3.5.1-1_amd64.deb</pre> * Haproxy 3.2 compiled against OpenSSL 1.1.1: <pre>haproxy-dbgsym_3.2.4-1.1os111_amd64.deb haproxy_3.2.4-1.1os111_amd64.deb</pre> * Haproxy 3.2 compiled against OpenSSL 3.5 <pre> haproxy_3.2.4-1.1os35_amd64.deb haproxy-dbgsym_3.2.4-1.1os35_amd64.deb </pre> * Haproxy 3.2 compiled against AWS-LC <pre>haproxy_3.2.4-1awslc_amd64.deb haproxy-dbgsym_3.2.4-1awslc_amd64.deb</pre> * AWS-LC libraries aren't packaged (yet) but must be copied in '''/opt/awslc/{bin,include,lib}''' path {{Note|text=AWS-LC is distributed as sources and must be compiled in advance (see instructions below). If the test results will indicate superior performance compared to the other libaries we will create a specific debian package for this (not really useful for the test context)}} <pre> |-- bin | |-- bssl | |-- c_rehash | `-- openssl |-- include | `-- openssl `-- lib |-- crypto |-- libcrypto.so |-- libssl.so |-- pkgconfig `-- ssl </pre> ===== HAProxy configuration ===== * On '''[server]''' HAProxy can reuse the same certificates and tls-ticket keys as usual, but configuration can be vastly simplified for the testing. Also, logs shouldn't be sent to HaproxyKafka socket or we'll pollute analytics data. A minimal configuration file like this can be copied over '''/etc/haproxy/haproxy.cfg''' {{Note|text=The following configuration is vastly simplified, keeping only the TLS parameters in common with the production one, as that's the only thing we're interested in testing in this case}} <syntaxhighlight lang="cfg"> global user haproxy group haproxy stats socket /run/haproxy/haproxy.sock mode 600 expose-fd listeners level admin log /var/lib/haproxy/dev/log local0 info #log /var/run/haproxykafka/haproxykafka.sock len 8192 format rfc5424 local0 info tune.http.logurilen 2048 # do not keep old processes longer than 5m after a reload hard-stop-after 5m set-dumpable nbthread 48 cpu-map 1/1- 0 48 2 50 4 52 6 54 8 56 10 58 12 60 14 62 16 64 18 66 20 68 22 70 24 72 26 74 28 76 30 78 32 80 34 82 36 84 38 86 40 88 42 90 44 92 46 94 #lua-load-per-thread /etc/haproxy/lua/maxmind-lookup.lua ssl-default-bind-options ssl-min-ver TLSv1.2 ssl-max-ver TLSv1.3 ssl-default-bind-ciphers -ALL:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_CHACHA20_POLY1305_SHA256:TLS_AES_256_GCM_SHA384 ssl-dh-param-file /etc/ssl/dhparam.pem tune.ssl.cachesize 512000 tune.ssl.lifetime 86400 maxconn 200000 tune.h2.header-table-size 4096 tune.h2.initial-window-size 65535 tune.h2.max-concurrent-streams 100 defaults mode http log-format "%rt %Tr %Tw %Tc %ST {%[capture.req.hdr(0)]} {%[capture.res.hdr(0)]} %ts" #log-format-sd %{+E}o\ [haproxykafka@0\ server_pid=\"%pid\"\ ip=\"%ci\"\ sequence=\"%ID\"\ dt=\"%tr\"\ time_backend_response=\"%Tr\"\ http_status=\"%ST\"\ response_size=\"%B\"\ termination_state=\"%ts\"\ uri_host=\"%[capture.req.hdr(0)]\"\ referer=\"%[capture.req.hdr(1)]\"\ user_agent=\"%[capture.req.hdr(2)]\"\ accept_language=\"%[capture.req.hdr(3)]\"\ range=\"%[capture.req.hdr(4)]\"\ accept=\"%[capture.req.hdr(5)]\"\ tls=\"%[capture.req.hdr(6)]\"\ cache_status=\"%[var(txn.x_cache_status)]\"\ content_type=\"%[var(txn.content_type)]\"\ x_analytics=\"%[var(txn.x_analytics)]\"\ x_cache=\"%[var(txn.x_cache)]\"\ backend=\"%[var(txn.server)]\"\ http_method=\"%HM\"\ uri_path=\"%HPO\"\ uri_query=\"%HQ\"] option httplog option dontlognull option accept-invalid-http-request option accept-invalid-http-response option http-ignore-probes retries 1 timeout connect 50000 timeout client 500000 timeout server 500000 frontend http # Just for test bind :80 mode http default_backend ok listen https log global maxconn 199000 bind :443 tfo ssl crt-list /etc/haproxy/crt-list.cfg tls-ticket-keys /run/haproxy-secrets/stek.keys bind :::443 tfo v6only ssl crt-list /etc/haproxy/crt-list.cfg tls-ticket-keys /run/haproxy-secrets/stek.keys timeout http-request 3600s timeout http-keep-alive 120s timeout client 120s timeout client-fin 120s timeout connect 3s timeout server 180s timeout tunnel 3600s default_backend ok backend ok http-request return status 200 content-type "text/plain" string "OK" </syntaxhighlight> ===== Metrics and logs ===== * Stop and disable <code>haproxykafka.service</code> before starting test to avoid polluting analytics data * Edit <code>mtail</code> configuration to avoid polluting SLO metrics: <syntaxhighlight lang="diff"> @@ -10,15 +10,11 @@ histogram haproxy_client_ttfb by cache_status, http_status_family buckets -1, 0.001, 0.005, 0.01, 0.02, 0.045, 0.07, 0.1, 0.15, 0.25, 0.35, 0.5, 0.75, 1.2, 3.0, 10.0, 30.0, 60.0 histogram haproxy_client_healthcheck_ttfb by cache_status, http_status_family buckets -1, 0.001, 0.005, 0.01, 0.02, 0.045, 0.07, 0.1, 0.15, 0.25, 0.35, 0.5, 0.75, 1.2, 3.0, 10.0, 30.0, 60.0 counter haproxy_termination_states_total by termination_state -counter haproxy_sli_total -counter haproxy_sli_good -counter haproxy_sli_bad hidden text cstatus hidden gauge process_time hidden text http_status_family / \d+ (?P<client_ttfb>\-?\d+) (?P<queue_time>\-?\d+) (?P<server_connection_time>\-?\d+) (?P<http_status_family>[1-5|\-])(1|\d\d) {(?P<host>[0-9A-Za-z\-\.:]+)} {(?P<cache_status>[a-z-]*)} (?P<termination_state>[A-Za-z-]{2})$/ { - haproxy_sli_total++ process_time = 0 $http_status_family =~ /^\-/ { @@ -46,16 +42,4 @@ $server_connection_time > 0 { process_time += $server_connection_time } - # We are excluding the following states: - # More details on http://docs.haproxy.org/2.6/configuration.html#8.5 - # R --> Resource on the proxy has been exhausted - # I --> Internal error - # D --> Session killed by HAProxy - # U --> Session killed by HAProxy (this shouldn't happen here) - # K --> Session actively killed by an admin operating on HAProxy (HAProxy config/TLS material reloads would trigger this one) - $termination_state =~ /^[\-CSPLcs]/ && process_time < 50 { - haproxy_sli_good++ - } else { - haproxy_sli_bad++ - } } </syntaxhighlight> * Restart the <code>haproxy-mtail@tls.service</code> ==== Client preparation ==== * The only needed setup on the '''[client]''' should be depooling and copying the required scripts for benchmarking * Benchmarking tools can be found at https://gitlab.wikimedia.org/fabfur/benchmark-scripts ==== Running the benchmarks ==== * Overall procedure: ** '''OpenSSL 1.1.1''' ** Install HAProxy 3.2 os111 (compiled against OpenSSL 1.1.1) on '''[server]''' using the debian packages (<code>haproxy-dbgsym_3.2.4-1.1os111_amd64.deb haproxy_3.2.4-1.1os111_amd64.deb</code>) {{Note|text=This will use OpenSSL 1.1.1 version that is already installed on Bullseye, so no need to install other packages}} ** Ensure the HAProxy version and compilation options are the expected one (<code>haproxy -vv</code>, openssl version) ** Ensure the HAProxy configuration is the "custom" one ** Start HAProxy with the systemd unit shipped (already present on Debian Bullseye cache hosts) ** Check HAProxy's journal log for eventual errors (leave it open) ** Gather and save metrics on '''[server]''' *** Start <code>perf top</code> on '''[server]''' to gather system metrics for later inspection *** Start <code>perf record -F 99 -p <HAPROXY_PID> -g</code> on '''[server]''' and stop it after each benchmark ** Start benchmarks on '''[client]''' *** Run curl benchmark script: **** To run the script on the same NUMA node as the NIC: ***** (example) <code>cat /sys/class/net/eno12399np0/device/numa_node</code> ***** (example) <code>numactl --cpunodebind=0 --membind=0 ./benchmark-curl.py --target 10.140.0.7 --num-requests 1000000 --concurrency 1000</code> ***** Repeat for concurrency = 10_000 and 100_000 *** Run TLS benchmark script: **** Same procedure as above to run the script on the same NUMA node as the nic **** <code>numactl --cpunodebind=0 --membind=0 ./tls-tester --endpoint 10.140.0.7 --requests 1000000 --goroutines 1000</code> **** Repeat for concurrency = 10_000 and 100_000 **After benchmark ***Copy perf result files locally ** Remove HAProxy 3.2 os111 package from '''[server]''' ** '''OpenSSL 3.5''' ** Install OpenSSL 3.5 packages over existing ones (upgrade) ** Install HAProxy 3.2 os3.5 (compiled against OpenSSL 3.5) packages ** Ensure HAProxy and OpenSSL version are the correct one, restart haproxy systemd unit ** Start benchmarks on the client as above and gather metrics ** Remove HAProxy 3.2 os3.5 packages from '''[server]''' ** ** '''AWS-LC''' ** Copy (if not alreay) awslc binaries on '''[server]'''/opt (as described above) ** Install HAProxy 3.2 awslc (compiled against AWS-LC) packages ** Ensure HAProxy and OpenSSL version are the correct one, restart haproxy systemd unit ** Start benchmarks on the client as above and gather metrics ==== Cleanup ==== * Repool '''[client]''' * Reimage '''[server]''' and repool it {{Note|text=The following instructions could be outdated. Use them as reference to build the various packages if needed}} === Build HAProxy 3.2, AWSLC and OpenSSL 3.5 on Debian Bullseye === The following has been performed on a Debian Bullseye container created with this Dockerfile:<syntaxhighlight lang="dockerfile"> FROM docker-registry.wikimedia.org/bullseye:latest ENV container=docker ENV LC_ALL=C ENV DEBIAN_FRONTEND=noninteractive WORKDIR /opt RUN apt-get update && apt-get install -y libpcre2-dev libjemalloc-dev python3-sphinx zlib1g-dev build-essential devscripts libssl-dev liblua5.4-dev python3-mako cmake libsystemd-dev pkgconf debhelper libsystemd-dev RUN apt-get install -y curl wget vim # RUN apt-get install -y systemd-dev libopentracing-c-wrapper-dev RUN echo "deb http://apt.wikimedia.org/wikimedia bullseye-wikimedia component/golang" > /etc/apt/sources.list.d/wikimedia.list && apt-get update && apt-get install -y golang-1.23 RUN update-alternatives --install /usr/bin/go go /usr/lib/go-1.23/bin/go 3 --slave /usr/bin/gofmt gofmt /usr/lib/go-1.23/bin/gofmt WORKDIR /opt VOLUME /opt </syntaxhighlight> === Build AWSLC === Use the following script to fetch, build and install AWSLC on '''/opt/awslc''' (standard path used by others). Run it on the Debian Bullseye container (mount '''/opt''' locally for easy debugging and copying).<syntaxhighlight lang="bash"> #!/bin/bash set -e CODE_PATH=/tmp/aws-lc BUILD_PATH=/tmp/aws-lc-build INSTALL_PATH=/opt/awslc echo "[*] fetching dependencies..." apt-get update apt-get install -y git cmake ninja-build perl clang perl tree echo echo "[*] cloning awslc in ${CODE_PATH}" if [[ -d ${CODE_PATH} ]]; then echo "[!] removing awslc code in ${CODE_PATH}" rm -fr "${CODE_PATH}" fi git clone --depth=1 https://github.com/aws/aws-lc.git "${CODE_PATH}" echo echo "[*] building awslc (build path: ${BUILD_PATH}, destination: ${INSTALL_PATH})" if [[ -d ${BUILD_PATH} ]]; then echo "[!] removing awslc build dir ${BUILD_PATH}" rm -fr "${BUILD_PATH}" fi mkdir -p "${BUILD_PATH}" cd "${BUILD_PATH}" cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="${INSTALL_PATH}" -DBUILD_SHARED_LIBS=1 "${CODE_PATH}" echo # echo "[*] running tests" # ninja run_tests # echo echo "[*] installing into ${INSTALL_PATH}" ninja install tree -L 2 "${INSTALL_PATH}" echo echo "[*] awslc installed in ${INSTALL_PATH}" </syntaxhighlight> === Build HAProxy 3.2 with OpenSSL 1.1.1 === Run it on the Debian Bullseye container (mount '''/opt''' locally for easy debugging and copying).<syntaxhighlight lang="bash"> #!/bin/bash set -e HAPROXY_SRC_PATH=/opt/haproxy-openssl-1.1.1 HAPROXY_DEB_PATH="$HAPROXY_SRC_PATH/haproxy-3.2.3" if [[ ! -d $HAPROXY_DEB_PATH ]]; then cd $HAPROXY_DEB_PATH dget -u http://deb.debian.org/debian/pool/main/h/haproxy/haproxy_3.2.3-2.dsc cd "$HAPROXY_DEB_PATH" dpkg-buildpackage -b else echo "[*] $HAPROXY_DEB_PATH already exists" cd "$HAPROXY_DEB_PATH" fakeroot debian/rules clean dpkg-buildpackage -b echo fi cd "$HAPROXY_DEB_PATH"/ </syntaxhighlight> === Build HAProxy 3.2 with AWSLC === The patched code for HAProxy to use AWSLC can be found at https://gitlab.wikimedia.org/fabfur/haproxy-awslc/-/tree/awslc-3.2/debian?ref_type=heads The following script (to be run on the Debian Bullseye container) just clones it and build the package.<syntaxhighlight lang="bash"> #!/bin/bash set -e HAPROXY_DEB_PATH=/tmp/haproxy-deb HAPROXY_REPO=https://gitlab.wikimedia.org/fabfur/haproxy-awslc.git HAPROXY_AWSLC_BRANCH=awslc-3.2 # Standard path for aws-lc AWSLC_PATH=/opt/awslc echo "[*] fetching dependencies ..." apt-get update apt-get install -y devscripts git echo echo "[*] downloading haproxy source package in ${HAPROXY_DEB_PATH}" if [[ -d "${HAPROXY_DEB_PATH}" ]]; then echo "[!] removing ${HAPROXY_DEB_PATH}" rm -fr "${HAPROXY_DEB_PATH}" fi mkdir "${HAPROXY_DEB_PATH}" cd "${HAPROXY_DEB_PATH}" git clone --depth=1 --branch "$HAPROXY_AWSLC_BRANCH" "${HAPROXY_REPO}" "${HAPROXY_DEB_PATH}/haproxy-awslc" echo echo "[*] Building haproxy binary package against awslc (${AWSLC_PATH})" cd "${HAPROXY_DEB_PATH}/haproxy-awslc" dpkg-buildpackage -b cd .. echo "[*] Debian packages in ${HAPROXY_DEB_PATH}" echo echo "[*] Verify binaries dependencies" ldd "${HAPROXY_DEB_PATH}/haproxy-awslc/debian/haproxy/usr/sbin/haproxy" echo </syntaxhighlight> === Build OpenSSL 3.5 on Debian Bullseye === The following script can be used as guideline to build OpenSSL and related libraries/debian packages on a Debian Bullseye. Note that the patched repository for this lives in https://gitlab.wikimedia.org/fabfur/openssl#<syntaxhighlight lang="bash"> #!/bin/bash set -e OPENSSL_DEB_PATH=/tmp/openssl-deb OPENSSL_REPO=https://gitlab.wikimedia.org/fabfur/openssl.git echo "[*] fetching dependencies ..." apt-get update apt-get install quilt echo echo "[*] downloading openssl repo in ${OPENSSL_DEB_PATH}" if [[ -d "{OPENSSL_DEB_PATH}" ]]; then echo "[!] removing ${OPENSSL_DEB_PATH}" rm -fr "${OPENSSL_DEB_PATH}" fi mkdir "${OPENSSL_DEB_PATH}" cd "${OPENSSL_DEB_PATH}" git clone --depth=1 $OPENSSL_REPO "${OPENSSL_DEB_PATH}/openssl-3.5" echo echo "[*] building openssl 3.5 ..." cd "${OPENSSL_DEB_PATH}/openssl-3.5" quilt push -a dpkg-buildpackage -b cd .. echo "[*] Debian packages in ${OPENSSL_DEB_PATH}" echo </syntaxhighlight> '''The following instructions and benchmarks were based on Debian Trixie, so I keep them around for reference only. The distribution used to build and test the various pieces is Bullseye instead. Please refer to above paragraphs.''' All steps must be performed in a trixie environment ==== Step 1 - script to compile and run awslc from source ==== <syntaxhighlight lang="bash"> #!/bin/bash set -e CODE_PATH=/tmp/aws-lc BUILD_PATH=/tmp/aws-lc-build INSTALL_PATH=/opt/awslc echo "[*] Checking debian version" . /etc/os-release if [[ ! $VERSION_ID -eq 13 ]]; then echo "[!] This must be run on Debian Trixie" exit 1 fi echo echo "[*] fetching dependencies..." apt-get update apt-get install -y git cmake ninja-build perl clang perl golang tree echo echo "[*] cloning awslc in ${CODE_PATH}" if [[ -d ${CODE_PATH} ]]; then echo "[!] removing awslc code in ${CODE_PATH}" rm -fr "${CODE_PATH}" fi git clone --depth=1 https://github.com/aws/aws-lc.git "${CODE_PATH}" echo echo "[*] building awslc (build path: ${BUILD_PATH}, destination: ${INSTALL_PATH})" if [[ -d ${BUILD_PATH} ]]; then echo "[!] removing awslc build dir ${BUILD_PATH}" rm -fr "${BUILD_PATH}" fi mkdir -p "${BUILD_PATH}" cd "${BUILD_PATH}" cmake -GNinja -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX="${INSTALL_PATH}" -DBUILD_SHARED_LIBS=1 "${CODE_PATH}" echo # echo "[*] running tests" # ninja run_tests # echo echo "[*] installing into ${INSTALL_PATH}" ninja install tree -L 2 "${INSTALL_PATH}" echo echo "[*] awslc installed in ${INSTALL_PATH}" </syntaxhighlight> ==== Step 2 - script to fetch and compile haproxy against awslc ==== <syntaxhighlight lang="bash"> #!/bin/bash set -e HAPROXY_DEB_PATH=/tmp/haproxy-deb HAPROXY_REPO=https://gitlab.wikimedia.org/fabfur/haproxy-awslc.git # Standard path for aws-lc AWSLC_PATH=/opt/awslc echo "[*] Checking debian version" . /etc/os-release if [[ ! $VERSION_ID -eq 13 ]]; then echo "[!] This must be run on Debian Trixie" exit 1 fi echo echo "[*] fetching dependencies ..." apt-get update apt-get install -y devscripts git echo echo "[*] downloading haproxy source package in ${HAPROXY_DEB_PATH}" if [[ -d "${HAPROXY_DEB_PATH}" ]]; then echo "[!] removing ${HAPROXY_DEB_PATH}" rm -fr "${HAPROXY_DEB_PATH}" fi mkdir "${HAPROXY_DEB_PATH}" cd "${HAPROXY_DEB_PATH}" git clone --depth=1 "${HAPROXY_REPO}" "${HAPROXY_DEB_PATH}/haproxy-awslc" echo echo "[*] Building haproxy binary package against awslc (${AWSLC_PATH})" cd "${HAPROXY_DEB_PATH}/haproxy-awslc" dpkg-buildpackage -b cd .. echo "[*] Debian packages in ${HAPROXY_DEB_PATH}" echo echo "[*] Verify binaries dependencies" ldd "${HAPROXY_DEB_PATH}/haproxy-awslc/debian/haproxy/usr/sbin/haproxy" echo </syntaxhighlight> ==== Step 3 ==== Test with sample configuration and self-signed certs:<syntaxhighlight lang="nginx"> global log stdout local0 notice #log /dev/log local1 notice #chroot /var/lib/haproxy #stats socket /run/haproxy/admin.sock mode 660 level admin stats timeout 30s daemon # Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets defaults log global mode http option httplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend https mode http bind :443 ssl crt /opt/test-configuration/snakeoil.pem http-request return status 200 content-type "text/plain" string "OK!" </syntaxhighlight>Different haproxy version (binaries) run w/ <code>haproxy -V -db -f /opt/test-configuration/haproxy.cfg</code> ==== Preliminary results ==== On local container, using the compiled binary haproxy with the above configuration and the benchmark-curl.py script (roughly <code>curl -s -Z --parallel-max <concurrency> -w "%{time_starttransfer}\\n" --insecure --no-sessionid -o /dev/null <URL></code>) ===== Debian Trixie, Haproxy 2.9.9, OpenSSL 3.5 ===== * 1000 requests, concurrency 100: <syntaxhighlight lang="text"> Max time: 5.718 ms p50 p75 p95 p99 p99.9 p99.99 0.023 0.024 4.133 5.159 5.338 5.680 </syntaxhighlight> * 10k requests, concurrency 1000: <syntaxhighlight lang="text"> Max time: 37.359 ms p50 p75 p95 p99 p99.9 p99.99 0.023 0.024 0.051 9.658 11.786 15.090 </syntaxhighlight> ===== Debian trixie, Haproxy 2.9.9, AwsLC ===== * 1000 requests, concurrency 100: <syntaxhighlight lang="text"> Max time: 5.666 ms p50 p75 p95 p99 p99.9 p99.99 0.029 0.047 3.878 4.859 5.066 5.606 </syntaxhighlight> * 10k requests, concurrency 1000: <syntaxhighlight lang="text"> Max time: 36.490 ms p50 p75 p95 p99 p99.9 p99.99 0.023 0.024 1.424 11.396 15.303 15.772 </syntaxhighlight> ===== Debian Bullseye, Haproxy 2.9.9. OpenSSL 1.1.1 ===== * 1000 requests, concurrency 100 <syntaxhighlight lang="text"> Max time: 5.217 ms p50 p75 p95 p99 p99.9 p99.99 1.240 3.028 3.673 3.953 4.043 5.100 </syntaxhighlight> <syntaxhighlight lang="text"> Max time: 4.364 ms p50 p75 p95 p99 p99.9 p99.99 1.781 1.900 2.167 2.530 2.885 3.012 </syntaxhighlight> {{collapse bottom}} oqaut53cc0k1hpfp7t2l5za70p6bsia Test Kitchen/Regulation section 0 459269 2398822 2365674 2026-04-03T17:12:05Z JVanderhoop-WMF 41069 Corrected tier numbering 2398822 wikitext text/x-wiki The regulation section allows to define the risk level of the instrument/experiment in [https://mpic.wikimedia.org Test Kitchen UI] along with its Security and legal review (if needed), according to the [https://foundation.wikimedia.org/wiki/Legal:Data_Collection_Guidelines Data Collection Guidelines]. [[File:XLab Regulation section.png|thumb|Test Kitchen UI Regulation section]] Note that the instrument/experiment cannot be activated until this section is fully filled. Anyway, the instrument/experiment can be registered without having defined yet the risk level that is required, and its corresponding security and legal review if needed. The risk level field can be set as <code>Risk assessment pending</code> and that will allow to register it (but not activate it) while you are working on defining the required details. The Regulation section can be filled properly later to set the risk level and the security and legal review (when needed). Once that is done, the instrument/experiment can be activated. {| class="wikitable" |+ Summary of the Regulation section requirements |- ! Risk level !! Security and legal review !! Observations |- | Risk assessment pending || Not required || The user can save the instrument/experiment but it cannot be activated |- | Tier 3: Low risk || Not required || No more requirements are needed to save or activate the instrument/experiment |- | Tier 2: Medium risk || Required || The user must provide a link the to the security and legal preview to be able to save the instrument/experiment |- | Tier 1: High risk || Required || The user must provide a link the to the security and legal preview to be able to save the instrument/experiment |} In addition to the above, in the case you are registering an instrument, some [[Test Kitchen/Contextual_attributes#Privacy_considerations|privacy considerations]] related to the selected contextual attributes and risk level must be considered to be able to pass the validation process. == Risk level == According to the [https://foundation.wikimedia.org/wiki/Legal:Data_Collection_Guidelines Data Collection Guidelines], there are three risk levels: Low risk, Medium risk, High risk. Note that the selected schema and contextual attributes may affect the required risk level of your instrument/experiment, because its value depends on the collected data. When registering an instrument, the selected contextual attributes may affect directly the required risk level. There are some combinations of them that increase the required risk level. If you want to know more about this, take a look at [[Test Kitchen/Contextual_attributes#Privacy_considerations|the privacy considerations]] regarding contextual attributes. In addition to that, when registering and configuring your instrument, [https://mpic.wikimedia.org Test Kitchen] will offer you some guidance and validation to help you through the process. If you are creating an experiment using <code>product_metrics.web_base</code> as the stream and <code>analytics/product_metrics/web/base</code> as the schema, you can set the risk level as <code>Tier 3: Low risk</code>. That stream configuration was already reviewed by the Legal, Security, Trust and Safety team. [[File:Regulation section risk level check.png|thumb|Test Kitchen UI information message to warn about the required risk level based on the selected contextual attributes]] == Security and legal review == A security and legal review is needed for instruments or experiments when the risk level is set as <code>Tier 1: High risk</code> or <code>Tier 2: Medium risk</code>. A link to that review must be filled for those cases. mkotzlms0t7d08057jhmn529ezt3u8c Deployments/Archive/2026/03 0 459888 2398839 2396600 2026-04-04T02:00:27Z DeploymentCalendarTool 20896 Add last week 2398839 wikitext text/x-wiki ==Week of March 02== ==={{Deployment_day|date=2026-03-01}}=== {{Deployment calendar event card |when=2026-03-01 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-02}}=== {{Deployment calendar event card |when=2026-03-02 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|katherine_g|katherine_g}} {{deploy|type=config|gerrit=1240672|title=Enable revert risk filters for first batch of wikis: < 1000 monthly edits|status=}} - {{phabricator|T411485}} {{ircnick|matthiasmullie|Matthias}} {{deploy|type=1.46.0-wmf.17|gerrit=1245265|title=Limit additional whitespace to sticky header version only|status=}} - {{phabricator|T416598}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.17|gerrit=1246904|title=HCaptchaEnterpriseHealthChecker: Add configurable retry count and delay|status=}} - {{phabricator|T418477}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-02 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-02 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|itamarWMDE|itamarWMDE}} {{deploy|type=config|gerrit=1245364|title=Add configurations for graphql usage survey and its pipeline tests|status=}} - {{phabricator|T414476}} {{ircnick|kostajh|kostajh}} {{deploy|type=config|gerrit=1247057|title=IPInfo: Set log level to "info"|status=}} - {{phabricator|T374718}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1247063|title=lawiki: add Adumbratio (draft) namespace|status=}} - {{phabricator|T418706}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-02 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-02 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-02 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-02 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-02 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|danisztls|Daniel de Souza}} {{deploy|type=config|gerrit=1247107|title=Undeploy Comparative Reader Research survey on eswiki|status=}} - {{phabricator|T417834}} {{deploy|type=config|gerrit=1247105|title=Undeploy Comparative Reader Research survey on enwiki|status=}} - {{phabricator|T417829}} {{ircnick|Kemayo|David L}} {{deploy|type=config|gerrit=1240721|title=Stop PasteCheck A/B test|status=}} - {{phabricator|T417429}} {{deploy|type=config|gerrit=1243990|title=Suggestion Mode: add values for suggestion feedback properties|status=}} - {{phabricator|T401739}} {{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1242613|title=Add growthexperiments.v0 to $wgRestSandboxSpecs|status=}} - {{phabricator|T414470}} {{ircnick|Pppery|Pppery}} {{deploy|type=config|gerrit=1226024|title=Add Comments namespace for shnwikinews|status=}} - {{phabricator|T414403}} {{ircnick|RoanKattouw|RoanKattouw}} {{deploy|type=1.46.0-wmf.17|gerrit=1247149|title=ApiCSPReport: Use structured logging for CSP reports|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-02 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-02 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-02 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.18</code> }} {{Deployment calendar event card |when=2026-03-02 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.18</code> to testwikis }} {{Deployment calendar event card |when=2026-03-02 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-02 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-02 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-03}}=== {{Deployment calendar event card |when=2026-03-03 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-03 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-03 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-03 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|nya_1F616EMO|nya_1F616EMO}} {{deploy|type=config|gerrit=1244373|title=zhwiki: Remove all rights from accountcreator|status=}} - {{phabricator|T418089}} {{ircnick|jakob_WMDE|jakob_WMDE}} {{deploy|type=config|gerrit=1247576|title=Enable Wikibase GraphQL on test.wikidata.org|status=}} - {{phabricator|T417619}} {{deploy|type=config|gerrit=1247577|title=Enable Wikibase GraphQL on production wikidata.org|status=}} - {{phabricator|T417619}} {{ircnick|edsanders|edsanders}} {{deploy|type=1.46.0-wmf.18|gerrit=1247578|title=PasteCheck: Enable by default|status=}} - {{phabricator|T405127}} {{deploy|type=config|gerrit=1240716|title=Remove Editing-related config for special wikis|status=}} - {{phabricator|T400063}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-03 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-03 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-03 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-03 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-03 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-03 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|jeena|Jeena}}, {{ircnick|dduvall|Dan}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.17->1.46.0-wmf.18|1.46.0-wmf.17|1.46.0-wmf.17}} * group0 to [[mw:MediaWiki_1.46/wmf.18|1.46.0-wmf.18]] * '''Blockers: {{phabricator|T413809}}''' }} {{Deployment calendar event card |when=2026-03-03 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|mooeypoo|Moriel Schottlender}} {{deploy|type=config|gerrit=1244748|title=REST: show the beta Attribution API in the REST Sandbox|status=}} - {{phabricator|T418522}} {{deploy|type=config|gerrit=1247652|title=Remove redundant mw-extra wgRestSandboxSpecs entry|status=}} {{ircnick|tgr|Gergő}} {{deploy|type=1.46.0-wmf.17|gerrit=1247689|title=Do not invalidate anon sessions with non-anon JWT cookies|status=}} - {{phabricator|T415007}} {{deploy|type=1.46.0-wmf.18|gerrit=1247690|title=Do not invalidate anon sessions with non-anon JWT cookies|status=}} - {{phabricator|T415007}} {{deploy|type=config|gerrit=1247596|title=Enable JWT session cookie for bot passwords (all wikis) (attempt #2)|status=}} - {{phabricator|T415007}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-03 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-03 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-04}}=== {{Deployment calendar event card |when=2026-03-04 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|nya_1F616EMO|nya_1F616EMO}} {{deploy|type=config|gerrit=1244373|title=zhwiki: Remove all rights from accountcreator|status=}} - {{phabricator|T418089}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-04 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-04 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-03-04 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|nya_1F616EMO|nya_1F616EMO}} {{deploy|type=config|gerrit=1244373|title=zhwiki: Remove all rights from accountcreator|status=}} - {{phabricator|T418089}} {{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=config|gerrit=1247566|title=Enable new HTML confirmation emails for all|status=}} - {{phabricator|T416748}} {{ircnick|tgr|Gergő}} {{deploy|type=config|gerrit=1248000|title=Revert "Enable JWT session cookie for bot passwords (all wikis) (attempt #2)"|status=}} - {{phabricator|T415007}} {{phabricator|T418999}} {{ircnick|Dreamy_Jazz|WBrown (WMF)}} {{deploy|type=config|gerrit=1248008|title=Define $wgWikimediaMessagesHasLiquidThreadsLogs|status=}} - {{phabricator|T417425}} {{deploy|type=1.46.0-wmf.18|gerrit=1248009|title=Hooks: Fix liquidthreads log type definition bugs|status=}} - {{phabricator|T417425}} {{phabricator|T419006}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-04 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-04 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }}{{Deployment calendar event card |what=DC Switchover Live Test (T418133) |when=2026-03-04 16:00 UTC |window=DC Switchover Live Test |who=Blake (bjensen) }}{{Deployment calendar event card |when=2026-03-04 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who={{ircnick|swfrench-wmf}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Shellbox updates. }} {{Deployment calendar event card |when=2026-03-04 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|jeena|Jeena}}, {{ircnick|dduvall|Dan}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.18|1.46.0-wmf.17->1.46.0-wmf.18|1.46.0-wmf.17}} * group1 to [[mw:MediaWiki_1.46/wmf.18|1.46.0-wmf.18]] * '''Blockers: {{phabricator|T413809}}''' }} {{Deployment calendar event card |when=2026-03-04 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|cwhite|cwhite}} {{deploy|type=config|gerrit=1245473|title=logging: set poolcounter channel log level to info|status=}} - {{phabricator|T418612}} {{ircnick|tgr|Gergő}} {{deploy|type=config|gerrit=1248007|title=Fix $wgJwtSessionCookieIssuer|status=}} - {{phabricator|T415007}} {{phabricator|T418999}} {{deploy|type=config|gerrit=1248012|title=Enable JWT session cookie for bot passwords (all wikis) (attempt #3)|status=}} - {{phabricator|T415007}} {{phabricator|T418999}} {{ircnick|cjming|cjming}} {{deploy|type=1.46.0-wmf.18|gerrit=1248081|title=Add synthetic AAA experiment|status=}} - {{phabricator|T418614}} {{deploy|type=1.46.0-wmf.17|gerrit=1248080|title=Add synthetic AAA experiment|status=}} - {{phabricator|T418614}} {{ircnick|ebernhardson|ebernhardson}} {{deploy|type=1.46.0-wmf.18|gerrit=1248084|title=Introduce a Semantic Search query route and builder|status=}} - {{phabricator|T413969}} {{deploy|type=1.46.0-wmf.18|gerrit=1248085|title=Wire up semantic query building|status=}} - {{phabricator|T413969}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-04 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-04 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-04 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-04 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-05}}=== {{Deployment calendar event card |when=2026-03-05 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|kipfel|Stang}} {{deploy|type=config|gerrit=1248314|title=Revert "zhwiki: Add 2026 CNY celebration logos"|status=}} - {{phabricator|T417240}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-05 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-05 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-05 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1247651|title=Enable confirmemail logstash channel|status=}} - {{phabricator|T415902}} {{deploy|type=1.46.0-wmf.18|gerrit=1248075|title=Confirmemail: Log delay between email sent and confirmation|status=}} - {{phabricator|T415902}} {{ircnick|ebernhardson|ebernhardson}} {{deploy|type=config|gerrit=1244713|title=cirrus: Add semantic search test cluster|status=}} - {{phabricator|T413969}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-05 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-05 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-05 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-03-05 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-05 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|jeena|Jeena}}, {{ircnick|dduvall|Dan}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.18|1.46.0-wmf.18|1.46.0-wmf.17->1.46.0-wmf.18}} * group2 to [[mw:MediaWiki_1.46/wmf.18|1.46.0-wmf.18]] * '''Blockers: {{phabricator|T413809}}''' }} {{Deployment calendar event card |when=2026-03-05 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|cscott|C. Scott Ananian}} {{deploy|type=config|gerrit=1247119|title=Enable parser survey for opted-out users on German/French/Polish wikis|status=}} - {{phabricator|T414852}} {{ircnick|ebernhardson|ebernhardson}} {{deploy|type=config|gerrit=1248508|title=cirrus: Align semanticsearch cluster group name with routing|status=}} - {{phabricator|T413969}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-05 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-05 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-06}}=== {{Deployment calendar event card |when=2026-03-06 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-03-06 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-03-07}}=== {{Deployment calendar event card |when=2026-03-07 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of March 09== ==={{Deployment_day|date=2026-03-08}}=== {{Deployment calendar event card |when=2026-03-08 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-09}}=== {{Deployment calendar event card |when=2026-03-09 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=1.46.0-wmf.18|gerrit=1248806|title=Add a script to send mandatory 2FA Echo notification|status=}} - {{phabricator|T419111}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-09 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-09 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.18|gerrit=1248075|title=Confirmemail: Log delay between email sent and confirmation|status=}} - {{phabricator|T415902}} {{deploy|type=config|gerrit=1247651|title=Enable confirmemail logstash channel|status=}} - {{phabricator|T415902}} {{ircnick|phuedx|Sam Smith}} {{deploy|type=1.46.0-wmf.18|gerrit=1249243|title=JS SDK: Add getExperimentByPrefix()|status=}} - {{phabricator|T419191}} {{deploy|type=1.46.0-wmf.18|gerrit=1249242|title=ext.wikimediaEvents: pageVisit -> loggedOutReaderRetention|status=}} - {{phabricator|T419191}} {{deploy|type=config|gerrit=1249262|title=Disable MetricsPlatform extension|status=}} - {{phabricator|T416865}} {{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=1.46.0-wmf.18|gerrit=1249291|title=Hide 2fa-warning Echo category from preferences|status=}} - {{phabricator|T419111}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-09 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-09 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-09 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what={{ircnick|herron|herron}}, {{ircnick|swfrench-wmf}} * {{deploy|type=config|gerrit=1249332|title=udp2log: switch to new hosts|status=}} - {{phabricator|T404334}} * Pilot new envoy drain features - {{phabricator|T364245}} }} {{Deployment calendar event card |when=2026-03-09 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-09 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|tgr|Gergő}} {{deploy|type=config|gerrit=1235552|title=Migrate EmailAuth, step 2|status=not done}} - {{phabricator|T404334}} * remove private code for [[phab:T397244|T397244]] {{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1249363|title=Remove redundant math spec file from wwwportal|status=}} - {{phabricator|T418188}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1249316|title=lift IP cap for womens month editathon|status=}} - {{phabricator|T419109}} {{deploy|type=config|gerrit=1249035|title=kaiwiki: add logo, stiename, projectnamespace and timezone|status=not done}} - {{phabricator|T414237}} {{ircnick|danisztls|Daniel de Souza}} {{deploy|type=config|gerrit=1249370|title=Pre-deploy participant recruitment survey on ptwiki and trwiki|status=}} - {{phabricator|T419275}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=config|gerrit=1247119|title=Enable parser survey for opted-out users on German/French/Polish wikis|status=}} - {{phabricator|T414852}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-09 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-09 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-09 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.19</code> }} {{Deployment calendar event card |when=2026-03-09 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.19</code> to testwikis }} {{Deployment calendar event card |when=2026-03-09 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-09 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-09 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-10}}=== {{Deployment calendar event card |when=2026-03-10 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-10 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-10 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-10 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=config|gerrit=1249903|title=Require 2FA from 6 other user groups ($wgRestrictedGroups)|status=}} - {{phabricator|T418580}} {{ircnick|abijeet|abijeet}} {{deploy|type=1.46.0-wmf.19|gerrit=1249937|title=Re-add correct namespace for translatable pages|status=}} - {{phabricator|T419294}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1249035|title=kaiwiki: add logo, stiename, projectnamespace and timezone|status=}} - {{phabricator|T414237}} {{ircnick|Dreamy_Jazz|WBrown (WMF)}} * Private code changes {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-10 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-10 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-10 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-10 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-10 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who={{ircnick|swfrench-wmf}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Further envoy drain testing in mw-debug - {{phabricator|T364245}} }} {{Deployment calendar event card |when=2026-03-10 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.18->1.46.0-wmf.19|1.46.0-wmf.18|1.46.0-wmf.18}} * group0 to [[mw:MediaWiki_1.46/wmf.19|1.46.0-wmf.19]] * '''Blockers: {{phabricator|T413810}}''' }} {{Deployment calendar event card |when=2026-03-10 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|danisztls|Daniel de Souza}} {{deploy|type=config|gerrit=1249983|title=Deploy participant recruitment survey on ptwiki and trwiki|status=d}} - {{phabricator|T419275}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.46.0-wmf.19|gerrit=1250007|title=Enables legacy processing in ParserOutputPostCacheTransform when cached|status=d}} - {{phabricator|T372592}} {{deploy|type=1.46.0-wmf.19|gerrit=1250015|title=Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode|status=d}} - {{phabricator|T416616}} {{phabricator|T416540}} {{phabricator|T419439}} {{ircnick|James_F|James_F}} {{deploy|type=config|gerrit=1238733|title=wikifunctions: Drop temporary WikifunctionsEnableHTMLOutput flag|status=d}} - {{phabricator|T397402}} {{deploy|type=config|gerrit=1238734|title=wikifunctions: Drop temporary WikifunctionsEnableWikidataInputTypes flag|status=d}} - {{phabricator|T397403}} {{deploy|type=config|gerrit=1249393|title=build: Upgrade mediawiki-phan-config from 0.18.0 to 0.20.0|status=d}} - {{phabricator|T419476}} {{deploy|type=config|gerrit=1249394|title=build: Upgrade mediawiki-codesniffer from 49.0.0 to 50.0.0|status=d}} {{deploy|type=config|gerrit=1249395|title=build: Upgrade symfony/yaml from 7.4.0 to 7.4.6 and alpha-sort|status=d}} {{ircnick|bwang|bwang}} {{deploy|type=config|gerrit=1240012|title=Enable personal main menu to all users in minerva|status=d}} - {{phabricator|T413912}} {{ircnick|tgr|Gergő}} {{deploy|type=config|gerrit=1235552|title=Migrate EmailAuth, step 2|status=}} - {{phabricator|T404334}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-10 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-10 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-11}}=== {{Deployment calendar event card |when=2026-03-11 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|katherine_g|katherine_g}} {{deploy|type=config|gerrit=1247639|title=Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis|status=}} - {{phabricator|T400727}} {{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=1.46.0-wmf.19|gerrit=1249921|title=Display list of 2FA-req. groups on AccountSecurity for 2FA-less users|status=}} - {{phabricator|T419422}} {{deploy|type=1.46.0-wmf.19|gerrit=1250066|title=Send2FAWarningNotifications: Support reading users from file|status=}} - {{phabricator|T419111}} {{deploy|type=config|gerrit=1250426|title=Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-11 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-11 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-03-11 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1247547|title=Remove `MetricsPlatform` configuration from production|status=}} - {{phabricator|T416865}} {{ircnick|jdlrobson|jdlrobson}} {{deploy|type=config|gerrit=1250566|title=Restore advanced main menu for AMC|status=}} - {{phabricator|T413912}} {{deploy|type=1.46.0-wmf.19|gerrit=1250568|title=Fix pinnableElement export|status=}} - {{phabricator|T419620}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-11 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-11 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-11 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who={{ircnick|swfrench-wmf}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Believe it or not, more envoy drain testing (mw-api-int, mw-web canaries) - {{phabricator|T364245}} }} {{Deployment calendar event card |when=2026-03-11 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.19|1.46.0-wmf.18->1.46.0-wmf.19|1.46.0-wmf.18}} * group1 to [[mw:MediaWiki_1.46/wmf.19|1.46.0-wmf.19]] * '''Blockers: {{phabricator|T413810}}''' }} {{Deployment calendar event card |when=2026-03-11 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|JSherman|Jsn.sherman}} {{deploy|type=1.46.0-wmf.19|gerrit=1250581|title=riskyArticleEdits: show page descriptions|status=d}} - {{phabricator|T419442}} {{deploy|type=1.46.0-wmf.19|gerrit=1250582|title=Fix Instrumentation on mobile view|status=d}} - {{phabricator|T419517}} {{ircnick|sfaci|sfaci}} {{deploy|type=1.46.0-wmf.19|gerrit=1250632|title=ext.wikimediaEvents: Updated Test Kitchen impact test experiment|status=d}} - {{phabricator|T407570}} {{ircnick|bvibber|bvibber}} {{deploy|type=1.46.0-wmf.18|gerrit=1250647|title=Revert "Fix for temp section open during slow loads on Parsoid"|status=d}} - {{phabricator|T416063}} {{phabricator|T419170}} {{phabricator|T419721}} {{deploy|type=1.46.0-wmf.19|gerrit=1250648|title=Revert "Fix for temp section open during slow loads on Parsoid"|status=d}} - {{phabricator|T416063}} {{phabricator|T419170}} {{phabricator|T419721}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1250579|title=urwikisource: add logo, sitename and projectnamespace|status=d}} - {{phabricator|T415974}} {{ircnick|arlolra|Arlolra}} {{deploy|type=1.46.0-wmf.18|gerrit=1250665|title=Show category index when no category selected on Special:LintTemplateErrors|status=}} - {{phabricator|T417363}} {{deploy|type=1.46.0-wmf.19|gerrit=1250666|title=Show category index when no category selected on Special:LintTemplateErrors|status=}} - {{phabricator|T417363}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-11 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-11 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-11 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-11 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-12}}=== {{Deployment calendar event card |when=2026-03-12 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-12 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-12 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-12 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|katherine_g|katherine_g}} {{deploy|type=1.46.0-wmf.19|gerrit=1250656|title=Add multilingual revert risk host header for LiftWing requests|status=}} - {{phabricator|T419718}} {{ircnick|edsanders|edsanders}} {{deploy|type=config|gerrit=1251005|title=Deploy EditCheck suggestion mode at all Wikipedias|status=}} - {{phabricator|T415320}} {{ircnick|phuedx|Sam Smith}} {{deploy|type=1.46.0-wmf.18|gerrit=1251031|title=ext.testKitchen: Depend on mediawiki.user module|status=}} {{deploy|type=1.46.0-wmf.19|gerrit=1251032|title=ext.testKitchen: Depend on mediawiki.user module|status=}} {{ircnick|matthiasmullie|Matthias}} {{deploy|type=1.46.0-wmf.18|gerrit=1251036|title=Remove queueing logic|status=}} - {{phabricator|T419587}} {{deploy|type=1.46.0-wmf.19|gerrit=1251037|title=Remove queueing logic|status=}} - {{phabricator|T419587}} {{deploy|type=1.46.0-wmf.18|gerrit=1251034|title=Update CSS selector for Mobile TOC button|status=}} - {{phabricator|T419587}} {{deploy|type=1.46.0-wmf.19|gerrit=1251035|title=Update CSS selector for Mobile TOC button|status=}} - {{phabricator|T419587}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-12 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-12 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-12 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-03-12 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-12 11:00 SF |length=2 |window=MediaWiki train - Utc-7 Version |who={{ircnick|brennen|Brennen}}, {{ircnick|jeena|Jeena}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.19|1.46.0-wmf.19|1.46.0-wmf.18->1.46.0-wmf.19}} * group2 to [[mw:MediaWiki_1.46/wmf.19|1.46.0-wmf.19]] * '''Blockers: {{phabricator|T413810}}''' }} {{Deployment calendar event card |when=2026-03-12 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|JSherman|Jsn.sherman}} {{deploy|type=config|gerrit=1249364|title=PersonalDashboard: enable CTA for pilot wikis|status=}} - {{phabricator|T418613}} {{deploy|type=config|gerrit=1251168|title=PersonalDashboard: enable CTA for pilot wikis|status=}} - {{phabricator|T418613}} {{ircnick|tgr|Gergő}} {{deploy|type=1.46.0-wmf.18|gerrit=1251087|title=Set 'sub' JWT field in client credentials access tokens|status=}} - {{phabricator|T417278}} {{deploy|type=1.46.0-wmf.19|gerrit=1251088|title=Set 'sub' JWT field in client credentials access tokens|status=}} - {{phabricator|T417278}} {{deploy|type=1.46.0-wmf.19|gerrit=1251152|title=Use 'alwaysShowLogin' query parameter during login|status=}} - {{phabricator|T419723}} {{deploy|type=1.46.0-wmf.19|gerrit=1251150|title=login: Add 'alwaysShowLogin' login URL parameter|status=}} - {{phabricator|T419723}} {{deploy|type=1.46.0-wmf.19|gerrit=1251106|title=phpunit: Avoid unnecessary writes in generatePHPUnitConfig.php|status=}} - {{phabricator|T419107}} {{ircnick|danisztls|Daniel de Souza}} {{deploy|type=config|gerrit=1251098|title=Deploy participant recruitment survey on frwiki|status=}} - {{phabricator|T419778}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=config|gerrit=1250750|title=Enable parser survey for opted-out users on ru/pt/ja/id wikis|status=}} - {{phabricator|T414852}} {{deploy|type=1.46.0-wmf.19|gerrit=1251173|title=Revert "Move post-processing of flaggedrevs views inside FlaggablePageView"|status=}} {{ircnick|_Gerges|Gerges}} {{deploy|type=config|gerrit=1251140|title=[arwikiquote] add namespace alias for NS_PROJECT|status=}} - {{phabricator|T419828}} {{ircnick|Nemoralis|Nemoralis}} {{deploy|type=config|gerrit=1251164|title=Increase IP cap limit for azwiki|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-12 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-12 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-13}}=== {{Deployment calendar event card |when=2026-03-13 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-03-13 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-03-14}}=== {{Deployment calendar event card |when=2026-03-14 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of March 16== ==={{Deployment_day|date=2026-03-15}}=== {{Deployment calendar event card |when=2026-03-15 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-16}}=== {{Deployment calendar event card |when=2026-03-16 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|katherine_g|katherine_g}} {{deploy|type=config|gerrit=1251276|title=Fix broken survey links on PersonalDashboard|status=}} - {{phabricator|T419950}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1251200|title=ptwiki: Enable block action for the abuse filter|status=}} - {{phabricator|T419312}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1253046|title=bowiki: update logos|status=}} - {{phabricator|T419268}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-16 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-16 4:00 SF |length=1 |window=gerrit primary reboot |who= {{ ircnick|arnaudb }} |what=Kernel reboot required for gerrit }} {{Deployment calendar event card |when=2026-03-16 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|James_F|James_F}} {{deploy|type=1.46.0-wmf.19|gerrit=1251487|title=Replace direct BagOStuff with WANObjectCache|status=d}} - {{phabricator|T419666}} {{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=1.46.0-wmf.19|gerrit=1253423|title=Always use external actor for interwiki rights logs on target wiki|status=}} - {{phabricator|T6055}} {{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=1.46.0-wmf.19|gerrit=1253450|title=AccountCreation: track account registrations for WE1.8 experiments|status=}} - {{phabricator|T416100}} {{deploy|type=1.46.0-wmf.19|gerrit=1253461|title=fix(anon warning): remove wring type=signup param|status=}} - {{phabricator|T415160}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1253046|title=bowiki: update logos|status=}} - {{phabricator|T419268}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-16 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-16 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-16 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-16 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-16 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.46.0-wmf.19|gerrit=1253551|title=Fix double post-processing in legacy preview case|status=}} - {{phabricator|T419908}} {{ircnick|RoanKattouw|RoanKattouw}} {{deploy|type=config|gerrit=1248665|title=Enable passwordless login in production|status=}} - {{phabricator|T419198}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.46.0-wmf.19|gerrit=1253623|title=Fix client credentials access tokens|status=}} - {{phabricator|T417278}} {{phabricator|T419921}} {{deploy|type=config|gerrit=1253625|title=Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster|status=}} - {{phabricator|T414338}} {{deploy|type=config|gerrit=1253626|title=Configure $wgApiClientErrorSampleRate|status=}} - {{phabricator|T418957}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.19|gerrit=1253572|title=Instrument clicks on external links to selected domains|status=}} - {{phabricator|T419837}} {{deploy|type=config|gerrit=1253566|title=Configure external link tracking on 12 wikis (411 ext. domains)|status=}} - {{phabricator|T419837}} {{ircnick|Dreamy_Jazz|WBrown (WMF)}} {{deploy|type=config|gerrit=1251848|title=Disable CheckUser on closed wikis where no checks were ever made|status=}} - {{phabricator|T420062}} {{deploy|type=config|gerrit=1251865|title=Uninstall SecurePoll from closed wikis|status=}} - {{phabricator|T420062}} {{deploy|type=config|gerrit=1251888|title=DiscussionTools: Uninstall wikis closed before permalinks were deployed|status=}} - {{phabricator|T420052}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-16 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-16 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-16 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.20</code> }} {{Deployment calendar event card |when=2026-03-16 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.20</code> to testwikis }} {{Deployment calendar event card |when=2026-03-16 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-16 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-16 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-17}}=== {{Deployment calendar event card |when=2026-03-17 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-17 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.19->1.46.0-wmf.20|1.46.0-wmf.19|1.46.0-wmf.19}} * group0 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-17 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-17 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-17 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|andre|andre}} {{deploy|type=1.46.0-wmf.20|gerrit=1254166|title=Remove misplaced readonly from CategoryViewer::$query|status=}} - {{phabricator|T420315}} {{ircnick|edsanders|edsanders}} {{deploy|type=1.46.0-wmf.19|gerrit=1254189|title=TitleWidget: Prioritise namespace prefix over interwiki prefix|status=}} - {{phabricator|T420288}} {{deploy|type=1.46.0-wmf.20|gerrit=1254190|title=TitleWidget: Prioritise namespace prefix over interwiki prefix|status=}} - {{phabricator|T420288}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=config|gerrit=1251610|title=Turn on postprocessing cache for all Parsoid parses|status=}} - {{phabricator|T348255}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-17 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-17 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-17 07:30 SF |length=0.333333333333 |window=Create new table for the [[mw:Extension:CampaignEvents|CampaignEvents extension]] |who={{ircnick|Daimona|Daimona}} |what=Create new <code>ce_event_goals</code> table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki ([[phab:T411433|T411433]]). }} {{Deployment calendar event card |when=2026-03-17 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-17 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|phuedx|Sam Smith}} {{deploy|type=puppet|gerrit=1249932|title=mw::maintenance: Remove ExperimentationLab periodic job |status=}} - {{phabricator|T419428}} {{ircnick|Dreamy_Jazz|WBrown (WMF)}} {{deploy|type=puppet|gerrit=1254225|title=mw::maintenance: Disable scripts for closed wikis on various extensions |status=}} - {{phabricator|T420052}} {{phabricator|T420063}} {{phabricator|T420062}} }} {{Deployment calendar event card |when=2026-03-17 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who={{ircnick|swfrench-wmf}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Deploy envoy drain-on-termination to mw-api-int and mw-web - {{phabricator|T364245}} }} {{Deployment calendar event card |when=2026-03-17 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.19->1.46.0-wmf.20|1.46.0-wmf.19|1.46.0-wmf.19}} * group0 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-17 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|aude|aude}} {{deploy|type=config|gerrit=1251309|title=Set wgReadingListsBetaDefaultForNewAccountsAfter for beta cluster|status=}} - {{phabricator|T419163}} {{ircnick|alexsanford|alexsanford}} {{deploy|type=1.46.0-wmf.20|gerrit=1254280|title=Remove notice from login form in popup mode - the core patch that this depends upon was already in the train, this one missed the train because of a CI problem which was since resolved.|status=}} - {{phabricator|T418534}} {{ircnick|RoanKattouw|RoanKattouw}} {{deploy|type=1.46.0-wmf.19|gerrit=1254302|title=Passwordless login: Don't display conditional auth errors|status=}} {{deploy|type=1.46.0-wmf.20|gerrit=1254301|title=Passwordless login: Don't display conditional auth errors|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-17 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-17 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-18}}=== {{Deployment calendar event card |when=2026-03-18 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-18 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.20|1.46.0-wmf.19->1.46.0-wmf.20|1.46.0-wmf.19}} * group1 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-18 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-18 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-03-18 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=config|gerrit=1254216|title=GrowthExperiments: increase edit and thanks query limit II|status=}} - {{phabricator|T341599}} {{deploy|type=1.46.0-wmf.20|gerrit=1254895|title=loggedOutWarning: dont set the schema for experiment events|status=}} - {{phabricator|T420451}} {{deploy|type=1.46.0-wmf.19|gerrit=1254894|title=loggedOutWarning: dont set the schema for experiment events|status=}} - {{phabricator|T420451}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.46.0-wmf.19|gerrit=1254891|title=Revert "SpecialPreferences: Use Language Select Widget in language field"|status=}} - {{phabricator|T419895}} {{deploy|type=1.46.0-wmf.20|gerrit=1254890|title=Revert "SpecialPreferences: Use Language Select Widget in language field"|status=}} - {{phabricator|T419895}} {{deploy|type=config|gerrit=1248095|title=filebackend: Remove outdated comment|status=}} {{ircnick|Msz2001|MSzwarc-WMF}} {{deploy|type=config|gerrit=1254876|title=Tweak configuration of external link aggregate usage analysis|status=}} - {{phabricator|T419837}} {{deploy|type=1.46.0-wmf.19|gerrit=1254917|title=Normalize external domain names in click analysis|status=}} - {{phabricator|T419837}} {{deploy|type=1.46.0-wmf.20|gerrit=1254916|title=Normalize external domain names in click analysis|status=}} - {{phabricator|T419837}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-18 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-18 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-18 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who={{ircnick|swfrench-wmf}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Deploy envoy drain-on-termination to mw-api-ext and mw-jobrunner - {{phabricator|T364245}} }} {{Deployment calendar event card |when=2026-03-18 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.20|1.46.0-wmf.19->1.46.0-wmf.20|1.46.0-wmf.19}} * group1 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-18 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what= {{ircnick|hector-arroyo|hector-arroyo}} {{deploy|type=config|gerrit=1254889|title=Reapply "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend"|status=}} - {{phabricator|T419125}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=1.46.0-wmf.20|gerrit=1254956|title=Limit legacy postprocessing cache to pages where DT does apply|status=}} - {{phabricator|T376183}} {{ircnick|Kemayo|David L}} {{deploy|type=1.46.0-wmf.20|gerrit=1254965|title=Editcheck: fix tagging not happening for non-default checks|status=}} {{ircnick|Pppery|Pppery}} {{deploy|type=config|gerrit=1252684|title=Disable magic links on afwiki|status=}} - {{phabricator|T420142}} {{deploy|type=config|gerrit=1250095|title=Enwikinews: Only enable flaggedRevs in article namespace|status=}} - {{phabricator|T418066}} {{deploy|type=config|gerrit=1242542|title=Generate our own logo thumbnails rather than using MediaWiki's|status=}} - {{phabricator|T414048}} {{ircnick|jdlrobson|jdlrobson}} {{deploy|type=1.46.0-wmf.20|gerrit=1255013|title=Guard for JS null deref on empty Parsoid sections|status=}} - {{phabricator|T419721}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-18 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-18 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-18 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-18 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-19}}=== {{Deployment calendar event card |when=2026-03-19 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1251200|title=ptwiki: Enable block action for the abuse filter|status=}} - {{phabricator|T419312}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-19 01:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.20|1.46.0-wmf.20|1.46.0-wmf.19->1.46.0-wmf.20}} * group2 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-19 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-19 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-19 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|MichaelG_WMF|Michael Grosse (dev @ WMF Growth team)}} {{deploy|type=1.46.0-wmf.20|gerrit=1255686|title=CreateAccount: Add class to aide in instrumentation|status=}} {{deploy|type=1.46.0-wmf.20|gerrit=1255685|title=createAccount: Log exposure and CTRs for account creation experiment|status=}} - {{phabricator|T419916}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-19 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-19 08:00 SF |length=1 |window=Train log triage |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-03-19 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what= {{ircnick|phuedx|Sam Smith}} {{deploy|type=puppet|gerrit=1249932|title=mw::maintenance: Remove ExperimentationLab periodic job |status=}} - {{phabricator|T419428}} {{ircnick|Dreamy_Jazz|WBrown (WMF)}} * {{deploy|type=puppet|gerrit=1255687|title=mw::maintenance: Purge blocks on closed but not preinstall wikis |status=}} - {{phabricator|T420571}} * {{deploy|type=puppet|gerrit=1255694|title=mw::maintenance: Run purgeRecentChanges.php on wikis without CheckUser |status=}} - {{phabricator|T420062}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-19 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-03-19 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-19 11:00 SF |length=2 |window=MediaWiki train - Utc-0+Utc-7 Version (secondary timeslot) |who={{ircnick|andre|Andre}}, {{ircnick|brennen|Brennen}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.20|1.46.0-wmf.20|1.46.0-wmf.19->1.46.0-wmf.20}} * group2 to [[mw:MediaWiki_1.46/wmf.20|1.46.0-wmf.20]] * '''Blockers: {{phabricator|T413811}}''' }} {{Deployment calendar event card |when=2026-03-19 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|arlolra|Arlolra}} {{deploy|type=config|gerrit=1253654|title=Deploy PRV to 13 wikis|status=}} - {{phabricator|T420273}} {{ircnick|katherine_g|katherine_g}} {{deploy|type=config|gerrit=1254865|title=Deploy Extension:PersonalDashboard to English Wikipedia|status=}} - {{phabricator|T418367}} {{ircnick|hector-arroyo|hector-arroyo}} {{deploy|type=1.46.0-wmf.20|gerrit=1255736|title=hcaptcha: Use the global edit key for MobileFrontend edits if present|status=}} - {{phabricator|T420574}} {{ircnick|JSherman|Jsn.sherman}} {{deploy|type=1.46.0-wmf.20|gerrit=1255772|title=Remove local configuration routing and loading|status=}} - {{phabricator|T419835}} {{ircnick|jdlrobson|jdlrobson}} {{deploy|type=1.46.0-wmf.20|gerrit=1255765|title=Implement addListener fallback for older browsers in matchMedia|status=}} - {{phabricator|T419717}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-19 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-19 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-20}}=== {{Deployment calendar event card |when=2026-03-20 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-03-20 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-03-21}}=== {{Deployment calendar event card |when=2026-03-21 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of March 23== ==={{Deployment_day|date=2026-03-22}}=== {{Deployment calendar event card |when=2026-03-22 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-23}}=== {{Deployment calendar event card |what=MaxRequestWorkers increase for Gerrit's reverse proxy |when=2026-03-22 22:00 SF |length=2 |window=Gerrit |who= {{ ircnick|arnaudb }} }}{{Deployment calendar event card |when=2026-03-23 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|abijeet|abijeet}} {{deploy|type=config|gerrit=1254149|title=Enable ULS rewrite beta feature|status=}} - {{phabricator|T418187}} {{phabricator|T253303}} {{ircnick|hector-arroyo|hector-arroyo}} {{deploy|type=1.46.0-wmf.20|gerrit=1255736|title=hcaptcha: Use the global edit key for MobileFrontend edits if present|status=}} - {{phabricator|T420574}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-23 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-23 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|hector-arroyo|hector-arroyo}} {{deploy|type=1.46.0-wmf.20|gerrit=1255736|title=hcaptcha: Use the global edit key for MobileFrontend edits if present|status=}} - {{phabricator|T420574}} {{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=1.46.0-wmf.20|gerrit=1259035|title=fix(WelcomeSurveyHooks): ensure accountJustCreated is always added|status=}} - {{phabricator|T420722}} {{deploy|type=1.46.0-wmf.20|gerrit=1259036|title=tests: add coverage for WelcomeSurveyHooks::onCentralAuthPostLoginRedirect|status=}} - {{phabricator|T420722}} {{deploy|type=1.46.0-wmf.20|gerrit=1259046|title=fix(WelcomeSurveyHooks): ensure accountJustCreated is always added 2|status=}} - {{phabricator|T420722}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-23 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-23 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-23 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-23 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-23 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|alexsanford|alexsanford}} {{deploy|type=config|gerrit=1256472|title=Reduce reauth timeout for editing site JS to 10 minutes|status=d}} - {{phabricator|T419605}} {{ircnick|RoanKattouw|RoanKattouw}} {{deploy|type=config|gerrit=1255847|title=testwiki: Add temporary groups for security testing|status=}} {{ircnick|danisztls|Daniel de Souza}} {{deploy|type=config|gerrit=1254448|title=Undeploy participant recruitment survey on ptwiki|status=d}} - {{phabricator|T419275}} {{deploy|type=config|gerrit=1254450|title=Undeploy participant recruitment survey on trwiki|status=d}} - {{phabricator|T419275}} {{deploy|type=config|gerrit=1254452|title=Undeploy participant recruitment survey on frwiki|status=d}} - {{phabricator|T419778}} {{ircnick|James_F|James_F}} {{deploy|type=config|gerrit=1259085|title=[abstractwiki] Enable the Translate extension|status=d}} - {{phabricator|T420656}} {{deploy|type=config|gerrit=1250113|title=Move testwiki-only Attribution REST API definition to IS|status=d}} {{deploy|type=1.46.0-wmf.20|gerrit=1256394|title=Abstract Wikipedia: Fix API call to get page info|status=d}} - {{phabricator|T420725}} {{ircnick|milimetric|maybe tchin}} {{deploy|type=config|gerrit=1255763|title=testKitchen: Add custom stream name|status=d}} - {{phabricator|T417050}} {{ircnick|cmelo|cmelo}} {{deploy|type=config|gerrit=1259120|title=Enable wgCampaignEventsEnableEventGoals in beta wikis|status=d}} - {{phabricator|T414148}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-23 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-23 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-23 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/0.00.0-wmf.0</code> }} {{Deployment calendar event card |when=2026-03-23 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/0.00.0-wmf.0</code> to testwikis }} {{Deployment calendar event card |when=2026-03-23 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-23 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-23 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-24}}=== {{Deployment calendar event card |when=2026-03-24 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-24 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.20->1.46.0-wmf.21|1.46.0-wmf.20|1.46.0-wmf.20}} * group0 to [[mw:MediaWiki_1.46/wmf.21|1.46.0-wmf.21]] * '''Blockers: {{phabricator|T420479}}''' }} {{Deployment calendar event card |when=2026-03-24 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-24 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-24 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Daimona|Daimona}} {{deploy|type=config|gerrit=1259231|title=Enable the CampaignEvents extension on all wikibooks|status=}} - {{phabricator|T419597}} {{deploy|type=config|gerrit=1259237|title=Enable $wgCampaignEventsEnableEventGoals in prod wikis|status=}} - {{phabricator|T414149}} {{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1259875|title=search: use the discovery ns record for the semanticsearch cluster|status=}} - {{phabricator|T414484}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-24 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-24 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-24 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-24 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-24 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-24 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|James_F|James_F}} {{deploy|type=1.46.0-wmf.21|gerrit=1259967|title=Set json object before setting Abstract Wiki Id|status=d}} - {{phabricator|T420916}} {{deploy|type=1.46.0-wmf.21|gerrit=1259994|title=AbstractPreview: apply selected preview language lang/dir to abstract preview body|status=d}} - {{phabricator|T420687}} {{deploy|type=1.46.0-wmf.21|gerrit=1260092|title=AbstractTitle: Handle pageinfo responses without normalized titles|status=d}} - {{phabricator|T420725}} {{deploy|type=config|gerrit=1259992|title=[abstractwiki] Don't list abstract as a langlist entry|status=d}} - {{phabricator|T420654}} {{deploy|type=config|gerrit=1256433|title=[wikifunctions] Drop m.wikifunctions.org from lists, we've not used it for years|status=d}} {{deploy|type=config|gerrit=1250114|title=Move GrowthExperiments REST API definition to IS|status=d}} {{deploy|type=config|gerrit=1259993|title=dumpInterwiki: Re-generate to add Abstract Wikipedia (and others)|status=d}} - {{phabricator|T420654}} {{ircnick|Pppery|Pppery}} {{deploy|type=config|gerrit=1242542|title=Generate our own logo thumbnails rather than using MediaWiki's|status=d}} - {{phabricator|T414048}} {{deploy|type=config|gerrit=1250095|title=Enwikinews: Only enable flaggedRevs in article namespace|status=d}} - {{phabricator|T418066}} {{deploy|type=config|gerrit=1252684|title=Disable magic links on afwiki|status=d}} - {{phabricator|T420142}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-24 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-24 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-25}}=== {{Deployment calendar event card |what=[https://phabricator.wikimedia.org/T417998 T417998] |when=2026-03-24 22:00 SF |length=2 |window=Gerrit / CDN |who= {{ ircnick|arnaudb }} }}{{Deployment calendar event card |when=2026-03-25 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-25 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.21|1.46.0-wmf.20->1.46.0-wmf.21|1.46.0-wmf.20}} * group1 to [[mw:MediaWiki_1.46/wmf.21|1.46.0-wmf.21]] * '''Blockers: {{phabricator|T420479}}''' }} {{Deployment calendar event card |when=2026-03-25 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-25 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-03-25 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|dcausse|dcausse}} {{deploy|type=config|gerrit=1260045|title=Revert^2 "search: use the discovery ns record for the semanticsearch cluster"|status=}} {{ircnick|awight|awight}} {{deploy|type=config|gerrit=1260614|title=[beta] Kill synthetic refs with feature flag|status=}} - {{phabricator|T421055}} {{ircnick|codenamenoreste|Codename Noreste}} {{deploy|type=config|gerrit=1251193|title=idwiki: Remove unused user groups on Indonesian Wikipedia|status=}} - {{phabricator|T419105}} {{deploy|type=config|gerrit=1251200|title=ptwiki: Enable block action for the abuse filter|status=}} - {{phabricator|T419312}} {{deploy|type=config|gerrit=1256748|title=ptwiki: Add suppressredirect to autoreviewer and rollbacker user groups|status=}} - {{phabricator|T420704}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-25 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-25 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-25 15:00 UTC |length=2 |window=Datacenter Switchover (T413974) |who=Blake (bjensen) |what=Datacenter Switchover (T413974) }} {{Deployment calendar event card |when=2026-03-25 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what= {{ircnick|jdlrobson|jdlrobson}} {{deploy|type=config|gerrit=1247073|title=Deploy temporary accounts to ruwiki|status=}} - {{phabricator|T413771}} {{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1259183|title=Add Analytics APIs to the RestSandbox|status=}} - {{phabricator|T419429}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.21|gerrit=1260797|title=SuggestedInvestigations: Import session into signal matching job|status=}} - {{phabricator|T421062}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-25 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-03-25 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-25 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-25 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-26}}=== {{Deployment calendar event card |when=2026-03-26 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-26 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|hashar|Antoine}}, {{ircnick|andre|Andre}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.21|1.46.0-wmf.21|1.46.0-wmf.20->1.46.0-wmf.21}} * group2 to [[mw:MediaWiki_1.46/wmf.21|1.46.0-wmf.21]] * '''Blockers: {{phabricator|T420479}}''' }} {{Deployment calendar event card |when=2026-03-26 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-26 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-26 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1256384|title=Temporarily add shellbox-icu to $wgShellboxUrls|status=}} - {{phabricator|T419049}} {{phabricator|T419242}} {{phabricator|T419274}} {{ircnick|Sergi0|Sergio Gimeno}} {{deploy|type=config|gerrit=1259132|title=GrowthExperiments: scale edit and thanks query limit to more wikis|status=}} - {{phabricator|T341599}} {{ircnick|anzx|anzx}} {{deploy|type=config|gerrit=1261420|title=cswiki: lift IP cap for editathon|status=}} - {{phabricator|T421305}} * [maintenance script] <code>mwscript-k8s --comment='T421305' --follow -- resetAuthenticationThrottle.php --wiki=cswiki --signup --ip=213.155.243.7</code> {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-26 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }}{{Deployment calendar event card |what=The location of the deployment server is being switched, per: https://wikitech.wikimedia.org/wiki/Switch_Datacenter/DeploymentServer |when=2026-03-26 15:00 UTC |length=1 |window=Deployment server switchover |who=bjensen, jasmine_ }}{{Deployment calendar event card |when=2026-03-26 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|James_F|James}}, {{ircnick|genocation|Genoveva}} {{deploy|type=puppet|gerrit=1256396|title=Enable view urls in abstract.wikipedia.org|status=}} - {{phabricator|T420666}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-26 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-03-26 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-26 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|SCardenasM|SCardenasM}} {{deploy|type=config|gerrit=1256498|title=PersonalDashboard: Add config for Active Discussions|status=}} - {{phabricator|T420785}} {{ircnick|RoanKattouw|RoanKattouw}} {{deploy|type=1.46.0-wmf.21|gerrit=1260834|title=Add Logstash logging for successful passwordless logins|status=}} {{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1261470|title=Enable $wgTempCategoryCollations for testwiki.|status=}} - {{phabricator|T419274}} {{phabricator|T419049}} {{ircnick|cscott|C. Scott Ananian}} {{deploy|type=config|gerrit=1261515|title=Use prod to serve maps in labs|status=}} - {{phabricator|T420299}} {{ircnick|MatmaRex|Bartosz}} {{deploy|type=1.46.0-wmf.21|gerrit=1261545|title=Wrap 'centralauthtoken' in a JWT|status=}} - {{phabricator|T420280}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-26 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-26 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-03-27}}=== {{Deployment calendar event card |when=2026-03-27 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-03-27 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-03-28}}=== {{Deployment calendar event card |when=2026-03-28 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==Week of March 30== ==={{Deployment_day|date=2026-03-29}}=== {{Deployment calendar event card |when=2026-03-29 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} ==={{Deployment_day|date=2026-03-30}}=== {{Deployment calendar event card |when=2026-03-30 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what= {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.21|gerrit=1264578|title=hCaptcha: Add APCu cache layer to health checker|statusd=}} - {{phabricator|T421204}} {{phabricator|T412947}} {{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1262091|title=Enable $wgTempCategoryCollations for s3 wikis.|status=}} - {{phabricator|T419274}} {{phabricator|T419049}} {{ircnick|MichaelG_WMF|Michael Grosse (dev @ WMF Growth team)}} {{deploy|type=1.46.0-wmf.21|gerrit=1264590|title=instrument(ReviseTone): record start of copyedit session|status=d}} - {{phabricator|T419181}} {{ircnick|James_F|James_F}} {{deploy|type=1.46.0-wmf.21|gerrit=1261477|title=Replace WANObjectCache with new MemcachedWrapper concept|status=d}} - {{phabricator|T419666}} {{deploy|type=1.46.0-wmf.21|gerrit=1262199|title=Fix match case for setting minute, week or month TTL on OrchestratorRequest|status=d}} - {{phabricator|T421475}} {{deploy|type=config|gerrit=1256432|title=Wikifunctions: Switch cache from mcrouter-wikifunctions to special access|status=nd}} - {{phabricator|T419666}} {{ircnick|eileen-m__|eileen-m__}} {{deploy|type=1.46.0-wmf.21|gerrit=1264605|title=Instrumentation: Track clicks for user account menu experiment|status=d}} - {{phabricator|T418053}} {{deploy|type=1.46.0-wmf.21|gerrit=1264625|title=Display create account button in main menu when user is logged out.|status=d}} - {{phabricator|T418053}} {{phabricator|T415647}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-30 08:30 SF |length=0.5 |window=Wikimedia Portals Update |who={{ircnick|jan_drewniak|Jan Drewniak}} |what=Weekly window for the portals page: https://www.wikipedia.org/ }} {{Deployment calendar event card |when=2026-03-30 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 10:00 SF |length=0.5 |window=Wikidata Query Service weekly deploy |who={{ircnick|ryankemper|Ryan}} |what=... }} {{Deployment calendar event card |when=2026-03-30 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1261526|title=config: Enable EmailConfirmationBanner on mediawikiwiki|status=}} - {{phabricator|T421366}} {{deploy|type=config|gerrit=1264662|title=config: Enable EmailConfirmationBanner on testwiki|status=}} - {{phabricator|T421366}} {{ircnick|tchin|tchin}} {{deploy|type=config|gerrit=1262303|title=[EventStreamConfig] Add product_metrics.web_base.active_reader_baseline stream|status=}} - {{phabricator|T420621}} {{ircnick|Nemoralis|Nemoralis}} {{deploy|type=config|gerrit=1264652|title=Add delete-redirect to filemovers on Wikimedia Commons|status=}} - {{phabricator|T421373}} {{ircnick|cjming|cjming}} {{deploy|type=config|gerrit=1264653|title=Add TestKitchenExposureResetEpoch config variable|status=}} - {{phabricator|T414738}} {{ircnick|annet|annet}} {{deploy|type=config|gerrit=1264630|title=Add event stream for logged-in reader retention experiment|status=}} - {{phabricator|T420490}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-30 14:00 SF |length=2 |window=Weekly Security deployment window |who={{ircnick|Reedy|Sam}}, {{ircnick|sbassett|Scott}}, {{ircnick|Maryum|Maryum}}, {{ircnick|manfredi|Manfredi}} |what=Held deployment window for Security-team related deploys. }} {{Deployment calendar event card |when=2026-03-30 16:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-30 17:00 SF |length=1 |window=Abstract Wikipedia off-cadence backend deployment |who=Abstract Wikipedia |what=Extra backend deployment to ensure that recent changes work as expected in prod }} {{Deployment calendar event card |when=2026-03-30 19:00 SF |length=1 |window=Automatic branching of MediaWiki, extensions, skins, and vendor – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Branch <code>wmf/1.46.0-wmf.22</code> }} {{Deployment calendar event card |when=2026-03-30 20:00 SF |length=1 |window=Automatic deployment of MediaWiki, extensions, skins, and vendor to testwikis only – see [[Heterogeneous deployment/Train deploys]] |who=N/A |what=Deploy <code>wmf/1.46.0-wmf.22</code> to testwikis }} {{Deployment calendar event card |when=2026-03-30 21:00 SF |length=1 |window=Automatic removal of all obsolete MediaWiki versions from the deployment and bare metal servers (except the most-recent obsolete version) |who=N/A |what=Runs <code>scap clean auto</code> }} {{Deployment calendar event card |when=2026-03-30 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-30 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-03-31}}=== {{Deployment calendar event card |when=2026-03-31 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.21->1.46.0-wmf.22|1.46.0-wmf.21|1.46.0-wmf.21}} * group0 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-03-31 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-31 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-03-31 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|Raine|Raine}} {{deploy|type=config|gerrit=1262091|title=Enable $wgTempCategoryCollations for s3 wikis.|status=}} - {{phabricator|T419274}} {{phabricator|T419049}} {{ircnick|xSavitar|xSavitar}} {{deploy|type=1.46.0-wmf.22|gerrit=1265367|title=Set a JWT cookie for OAuth 1 and OAuth 2 owner-only requests|status=}} - {{phabricator|T417833}} {{deploy|type=1.46.0-wmf.22|gerrit=1265368|title=tests: OAuth1 and OAuth2 owner-only JWT support|status=}} - {{phabricator|T417833}} {{phabricator|T415281}} {{deploy|type=1.46.0-wmf.22|gerrit=1265369|title=tests: Add test for asserting JWT cookie not set for OAuth2 consumers|status=}} - {{phabricator|T417833}} {{phabricator|T415281}} {{deploy|type=config|gerrit=1260006|title=Enable JWTs for OAuth1 consumers and OAuth2 owner-only consumers|status=}} - {{phabricator|T417833}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 07:00 SF |length=0.5 |window=Test Kitchen UI Deployment Window |who=Experimentation Platform Team |what=Deployment of Test Kitchen UI (fka MPIC) }} {{Deployment calendar event card |when=2026-03-31 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-03-31 08:00 SF |length=1 |window=SRE Collaboration Services office hours |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=Services including Gerrit, Phorge (Phabricator), GitLab }} {{Deployment calendar event card |when=2026-03-31 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|sfaci|sfaci}} {{deploy|type=config|gerrit=1238312|title=Test Kitchen SLOs: Renaming slos because of the Test Kitchen renaming |status=}} - {{phabricator|T414381}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-03-31 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-03-31 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|AaronSchulz|AaronSchulz}} {{deploy|type=config|gerrit=1261732|title=Move all analytics API sandbox entries to testwiki|status=}} - {{phabricator|T419429}} {{ircnick|manfredi|manfredi}} {{deploy|type=1.46.0-wmf.22|gerrit=1264921|title=Email confirmation banner: Add Test Kitchen A/B gating|status=}} - {{phabricator|T421366}} {{deploy|type=1.46.0-wmf.22|gerrit=1264922|title=Add instrumentation for email confirmation lifecycle events|status=}} - {{phabricator|T420007}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-03-31 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-03-31 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-01}}=== {{Deployment calendar event card |when=2026-04-01 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22|1.46.0-wmf.21->1.46.0-wmf.22|1.46.0-wmf.21}} * group1 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-04-01 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who={{ircnick|dusen|daniel}}, {{ircnick|effie|effie}} |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. * Daniel deploying REST gateway updates, five patches starting at https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1260763/4 }} {{Deployment calendar event card |when=2026-04-01 04:00 SF |length=1 |window=[[mw:Services|Services]] – [[Citoid]] / [[Zotero]] |who=Marielle ({{ircnick|mvolz}}) |what=See [[mw:Citoid|Citoid]] }} {{Deployment calendar event card |when=2026-04-01 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 07:00 SF |length=1 |window=Wikifunctions Services UTC Afternoon |who=Abstract Wikipedia team (Africa, Europe, Eastern Americas) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-01 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-01 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-01 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1266314|title=config: Enable EmailConfirmationBanner on mediawikiwiki|status=}} - {{phabricator|T421366}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-01 14:00 SF |length=1 |window=Wikifunctions Services UTC Late |who=Abstract Wikipedia team (North and South America) |what=Wikifunctions back-end k8s services }} {{Deployment calendar event card |when=2026-04-01 15:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-01 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-01 23:00 SF |length=0.5 |window=Primary database switchover |who={{ircnick|marostegui|Manuel Arostegui}}, {{ircnick|Amir1|Amir}}, {{ircnick|federico3|Federico Ceratto}} |what=Held deployment window for database primary masters maintenance }} ==={{Deployment_day|date=2026-04-02}}=== {{Deployment calendar event card |when=2026-04-02 00:00 SF |length=1 |window=[[Backport windows|UTC morning backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Amir1|Amir}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|awight|Adam}} |what={{ircnick|georgekyz|georgekyz}} {{deploy|type=config|gerrit=1266228|title=EventStreamConfig: Add rr-multilingual prediction_change stream|status=}} - {{phabricator|T415892}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 01:00 SF |length=2 |window=MediaWiki train - Utc-0 Version |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=[[mw:MediaWiki 1.46/Roadmap#Schedule for the deployments|1.46 schedule]] {{DeployOneWeekMini|1.46.0-wmf.22|1.46.0-wmf.22|1.46.0-wmf.21->1.46.0-wmf.22}} * group2 to [[mw:MediaWiki_1.46/wmf.22|1.46.0-wmf.22]] * '''Blockers: {{phabricator|T420480}}''' }} {{Deployment calendar event card |when=2026-04-02 03:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC mid-day) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. {{ircnick|dues|DKinzler_(WMF)}} {{deploy|type=chart|gerrit=1266237|title=rest gateway: define authed-user class|status=}} - {{phabricator|T420280}} {{phabricator|T419796}} {{deploy|type=chart|gerrit=1265333|title=introduce policy for abstractwiki/wikifunctions|status=}} - {{phabricator|T421581}} }} {{Deployment calendar event card |when=2026-04-02 05:00 SF |length=1 |window=Mobileapps/RESTBase/Wikifeeds |who=Content Transform Team |what=Content transform team node services (mobileapps/wikifeeds) }} {{Deployment calendar event card |when=2026-04-02 06:00 SF |length=1 |window=[[Backport windows|UTC afternoon backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|Lucas_WMDE|Lucas}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}} |what={{ircnick|manfredi|manfredi}} {{deploy|type=config|gerrit=1261516|title=config: Enable EmailConfirmationBanner on selected wikis|status=}} - {{phabricator|T421366}} {{ircnick|HouseOfM|HouseOfM}} {{deploy|type=config|gerrit=1266964|title=Enable the CampaignEvents extension on incubator|status=}} - {{phabricator|T421749}} {{ircnick|edsanders|edsanders}} {{deploy|type=1.46.0-wmf.22|gerrit=1266985|title=Fix suggestion mode availability check|status=}} - {{phabricator|T422143}} {{ircnick|bwang|bwang}} {{deploy|type=1.46.0-wmf.22|gerrit=1267008|title=Add logged-in reader retention instrument|status=}} - {{phabricator|T420490}} {{ircnick|kostajh|kostajh}} {{deploy|type=1.46.0-wmf.22|gerrit=1267056|title=hCaptcha: Emit Prometheus counter on health check failover|status=}} - {{phabricator|T421204}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 07:00 SF |length=1 |window=DC Switchover: Day 8 - Codfw Repool |who={{ircnick|jasmine_}} |what=Codfw Repool }} {{Deployment calendar event card |when=2026-04-02 07:30 SF |length=0.5 |window=Test Kitchen Experiment Deployment Window |who=Test Kitchen |what=Automatic start/stop of active experiments and instruments managed by [https://wikitech.wikimedia.org/wiki/Test_Kitchen Test Kitchen]. }} {{Deployment calendar event card |when=2026-04-02 08:00 SF |length=1 |window=Train log triage |who={{ircnick|jnuche|Jaime}}, {{ircnick|hashar|Antoine}} |what=See [[Heterogeneous deployment/Train deploys#Breakage]] }} {{Deployment calendar event card |when=2026-04-02 09:00 SF |length=1 |window=[[Puppet request window]]<br/><small>'''(Max 6 patches)'''</small> |who={{ircnick|jhathaway|JHathaway}}, {{ircnick|rzl|Reuven}} |what={{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to Puppet change'' }} {{Deployment calendar event card |when=2026-04-02 10:00 SF |length=1 |window=Cloud Services/Technical Documentation weekly deploy (Toolhub, Developer portal, Striker) |who={{ircnick|bd808}} |what=... }} {{Deployment calendar event card |when=2026-04-02 10:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC late) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} {{Deployment calendar event card |when=2026-04-02 13:00 SF |length=1 |window=[[Backport windows|UTC late backport window]]<br/><small>'''Your patch may or may not be deployed at the sole discretion of the deployer'''</small> |who={{ircnick|RoanKattouw|Roan}}, {{ircnick|Urbanecm|Martin}}, {{ircnick|TheresNoTime|Sammy}}, {{ircnick|kindrobot|Stef}}, {{ircnick|cjming|Clare}} |what={{ircnick|nya_1F616EMO|1F616EMO}} {{deploy|type=config|gerrit=1264569|title=zhwikinews: 20th anniversary logo change|status=}} - {{phabricator|T420165}} {{deploy|type=config|gerrit=1265959|title=arbcom_zhwiki: Enable SecurePoll without PII rights|status=}} - {{phabricator|T419309}} {{ircnick|bwang|bwang}} {{deploy|type=1.46.0-wmf.22|gerrit=1267008|title=Add logged-in reader retention instrument|status=}} - {{phabricator|T420490}} {{ircnick|kemayo|David L}} {{deploy|type=1.46.0-wmf.22|gerrit=1267204|title=SuggestedLinkEditCheck: fetchSuggestions return a jQuery.Promise|status=}} {{ircnick|irc-nickname|Requesting Developer}} * ''Gerrit link to backport or config change'' }} {{Deployment calendar event card |when=2026-04-02 14:00 SF |length=1 |window=Web Team deployment window |who=Web Team |what=NOTE: often skipped, the web team does not typically check IRC so assume this is not being used if 5 minutes past the start }} {{Deployment calendar event card |when=2026-04-02 23:00 SF |length=1 |window=[[MediaWiki_On_Kubernetes#How_to_manage_changes_to_the_infrastructure|MediaWiki infrastructure]] (UTC early) |who=SRE team |what=MediaWiki-related infrastructure changes that need a kubernetes deployment. }} ==={{Deployment_day|date=2026-04-03}}=== {{Deployment calendar event card |when=2026-04-03 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} {{Deployment calendar event card |when=2026-04-03 04:00 SF |length=0.5 |window=GitLab version upgrades |who={{ircnick|jelto|Jelto}}, {{ircnick|arnoldokoth|Arnold}}, {{ircnick|mutante|Daniel}}, {{ircnick|arnaudb|Arnaud}} |what=GitLab version upgrades }} ==={{Deployment_day|date=2026-04-04}}=== {{Deployment calendar event card |when=2026-04-04 00:00 SF |length=24 |window=No deploys all day! See [[Deployments/Emergencies]] if things are broken. |who= |what=No Deploys }} b3orww7qgdvzo68cv20dtngayexet45 Help:SSH Fingerprints/bast4006.wikimedia.org 12 460003 2398844 2396708 2026-04-04T04:52:17Z Quiddity 1884 lang=text 2398844 wikitext text/x-wiki <syntaxhighlight lang=text> +---------+---------+-----------------------------------------------------+ | Cipher | Algo | Fingerprint | +---------+---------+-----------------------------------------------------+ | RSA | SHA-256 | SHA256:TIUlCpJxdDlWxCq/6/ut9MD5S3ITTtS8ktWi/oxZc9Q | +---------+---------+-----------------------------------------------------+ | ECDSA | SHA-256 | SHA256:iidoMJrH4PBn8gd6dKAWUK2zMfX1gQ9Uvpg/ehbthSM | +---------+---------+-----------------------------------------------------+ | ED25519 | SHA-256 | SHA256:69BCpJB01GXCCX01rGvuBmJix6Dv/QJIkAzFRT9J0u4 | +---------+---------+-----------------------------------------------------+ +---[RSA 2048]----+ +---[ECDSA 256]---+ +--[ED25519 256]--+ | o++ .=+oo | | ... ..o. | |++o+B*ooooo | | .o o+.oo o . | |. o +.. | |oo.o +*+. .. | | ..o. . = . | | . o . . +.. | |. o .=. . | | . .o . + o . | | * . . oo. | | ... o... | | o S * . . E| |= B . S .. . | |.o o... S. | | .. = o . | |+B+.o... .E + . | |o + =.E.o. | | .* * o . | |ooBo=oo o+ o | | o = .o+o | | .. O B o | | o.=.o. .o.. | | ... +o | | .++o.B.o | | ... .o | | .. .o.oo | +----[SHA256]-----+ +----[SHA256]-----+ +----[SHA256]-----+ </syntaxhighlight> nd6oycahykffta4sv9xgg4spnlys26p Public baremetal hosts in core sites 0 460023 2398843 2398487 2026-04-04T04:50:41Z Quiddity 1884 lang=bash 2398843 wikitext text/x-wiki == Current state (2026) == The previous network design used Juniper’s virtual-chassis to stretch vlans across each A to D rows. We’re progressively migrating private baremetal hosts to per rack vlans for easier management, better segregation (smaller blast radius) and overall leaner design. This is possible as private IPs are not scarce. VMs (both private and public) will be treated independently through the Routed Ganeti project (in prod in 3 POPs). POPs have little baremetal public hosts as well as only 2 racks each, so it’s fine to allocate a public range to each rack. That leaves us with the baremetal public hosts in the core sites. '''This document is aimed at defining the various options, their tradeoffs and the chosen one.''' === Some data === EQIAD has 18 public baremetal hosts, CODFW 13 hosts.<syntaxhighlight lang=bash> cumin1003:~$ sudo cumin 'P{F:fqdn ~ ".wikimedia.org$"} and not A:vms and A:eqiad and not A:misc-nonprod' </syntaxhighlight>In eqiad the public subnets consist of 3x /26 (~25-32% usage) and 1x /27 (~60% usage). In codfw, they are allocated 4x /27 (~56-66% usage). But keep in mind that they’re also used for public VMs. Those public VMs will use their own range once migrated to routed Ganeti (one /26 reserved for codfw AB routed ganeti cluster). The largest “cluster” is the dns servers, made of 3 nodes in each site. == Path forward == === Option 1 - VXLAN === This option means keeping the current status-quo of mimicking the row wide vlan using a VXLAN overlay. Pros: * Easiest to implement (already implemented) * Hosts can stay where they are, new hosts can be racked anywhere in the row * 4 public ranges for servers placements, 2 or 4 failure domains depending on how we look at it Cons: * Requires and overlay (VXLAN) - more complex to troubleshoot, more prone to bugs, higher license cost (less so with Nokia) * Will eventually requires to shrink the /26 and /27 to smaller ranges to not waste public IPs once VMs are on routed Ganeti * Can’t add public hosts in rows E/F without major changes (or mixed setup) === Option 2 - public vlan on selected switches === As it would be wasting too much public IPs to have a public vlan on each switch like we do for the private vlans, this option is to only create a public vlan on some switches. The exact amount will depend on the rack diversity needed for the services. At least 3 (one in fabric AB, one in fabric CD and one in EF), and more if needed. Pros: * Pure routing (no overlay), in some way similar to the POPs * Can add public hosts in rows E/F Cons: * Requires to physically move and re-IP most of the servers * Constraint on servers placement === Option 3 - /32 IPs and <code>scope link</code> === '''I’m not sure yet if it’s even possible.''' Similarly to what we’re doing with routed Ganeti VMs, we could imagine the baremetal servers having a /32 v4 IP as well as a /128 IPv6. The ToR switch would have a configuration similar to the Ganeti hypervisors : a static route pointing to the server facing interface. This route would be re-distributed in the network to assure reachability. Pros: * No server location constrains Cons: * Requires thorough investigation, maybe not even possible == Conclusion == Netops preference goes to option 2, but option 3 could be interesting to investigate if we had more time. Option 2 would mean 3 racks with 6 hosts each in eqiad, and ~4 hosts each in codfw. Allocate a /28 (16 IPs) for each. Starting with E/F, and working with the service owners as well as DCops for the A-D migration. == Implementation == Create public vlan on eqiad and codfw pods E/F - https://phabricator.wikimedia.org/T422043 4mrqpuc8cljjxn500mspe6zyni0fdry